making parsedatetime data-driven when parsing dates
Well sure, it’s being used by Chandler, but I mean now it has non-OSAF users and one of them has actually filed a “issue” (code.google.com speak for a bug). In the issue he (Alan) wants parsedatetime to support “Aussie” date formats, i.e. dd-mm-yyyy.
I’ve always had it in my head that this kind of support would be necessary and knew that I would be doing the code sooner or later. Looks like it was sooner ;)
With the changes I recently made to support PyICU and locales in general a lot of the hard work has already been done. What was left was figuring out how to convert Darshana’s parseDate() code (she had extracted some of the code I was using in a couple different spots and made it into a function) into something that could be data-driven.
First I needed to solve the issue of how do you figure out programatically what the order is? Oh sure, for the pdtLocale classes I create manually when PyICU isn’t available I can just specify the order, but I wanted to be able to also figure it out for PyICU and since I had already come up with a way of extracting from the short time format the time separator and optionally the meridian text, I kinda figured I could use the same thing for dates.
Here’s the code that extracts the order from PyICU’s short date format:
# grab the ICU date format class for 'short'
o = ptc.icu_df['short']
# ask ICU to build a date string using the given datetime
s = o.format(datetime.datetime(2003, 10, 30, 11, 45))
# because I used unique values, I can replace them with ''
# which *should* leave only the date separator
s = s.replace('10', '').replace('30', '')
# extract the separator, or default if nothing found
if len(s) > 0:
ds = s[0]
else:
ds = '/'
ptc.dateSep = [ ds ]
# now that I have the date separator
# parse the short date format string
s = ptc.dateFormats['short']
l = s.lower().split(ds)
dp_order = []
for s in l:
if len(s) > 0:
dp_order.append(s[:1])
ptc.dp_order = dp_order
The above code will return [’m’, ‘d’, ‘y’] for enUS and [’d’, ‘m’, ‘y’] for enAU.
With the order determined, the following code takes the values returned from the regex’s and builds the appropriate values
# v1, v2 and v3 are initialized to -1
# this lets 0 values in the text pass thru
# so they are flagged as errors downstream
v = [ v1, v2, v3 ]
d = { 'm': mth, 'd': dy, 'y': yr }
# run thru the dp_order list in sequence
# and replace the value in d if it's not -1
for i in range(0, 3):
n = v[i]
c = self.ptc.dp_order[i]
if n >= 0:
d[c] = n
mth = d['m']
dy = d['d']
yr = d['y']
Pretty nifty if I may say so myself :) but I want to run it by people like Phillipe (pje) to really find out if it’s a good python algorighm.