Bear's Log

making parsedatetime data-driven when parsing dates

07 Sep 2006 08:44 by bear

Well sure, it’s being used by Chandler, but I mean now it has non-OSAF users and one of them has actually filed a “issue” (code.google.com speak for a bug). In the issue he (Alan) wants parsedatetime to support “Aussie” date formats, i.e. dd-mm-yyyy.

I’ve always had it in my head that this kind of support would be necessary and knew that I would be doing the code sooner or later. Looks like it was sooner ;)

With the changes I recently made to support PyICU and locales in general a lot of the hard work has already been done. What was left was figuring out how to convert Darshana’s parseDate() code (she had extracted some of the code I was using in a couple different spots and made it into a function) into something that could be data-driven.

First I needed to solve the issue of how do you figure out programatically what the order is? Oh sure, for the pdtLocale classes I create manually when PyICU isn’t available I can just specify the order, but I wanted to be able to also figure it out for PyICU and since I had already come up with a way of extracting from the short time format the time separator and optionally the meridian text, I kinda figured I could use the same thing for dates.

Here’s the code that extracts the order from PyICU’s short date format:

      # grab the ICU date format class for 'short'
    o = ptc.icu_df['short']

      # ask ICU to build a date string using the given datetime
    s = o.format(datetime.datetime(2003, 10, 30, 11, 45))

      # because I used unique values, I can replace them with ''
      # which *should* leave only the date separator
    s = s.replace('10', '').replace('30', '')

      # extract the separator, or default if nothing found
    if len(s) > 0:
        ds = s[0]
    else:
        ds = '/'

    ptc.dateSep = [ ds ]

      # now that I have the date separator
      # parse the short date format string
    s        = ptc.dateFormats['short']
    l        = s.lower().split(ds)
    dp_order = []

    for s in l:
        if len(s) > 0:
            dp_order.append(s[:1])

    ptc.dp_order  = dp_order

The above code will return [’m’, ‘d’, ‘y’] for enUS and [’d’, ‘m’, ‘y’] for enAU.

With the order determined, the following code takes the values returned from the regex’s and builds the appropriate values

      # v1, v2 and v3 are initialized to -1
      # this lets 0 values in the text pass thru
      # so they are flagged as errors downstream
    v = [ v1, v2, v3 ]
    d = { 'm': mth, 'd': dy, 'y': yr }

      # run thru the dp_order list in sequence
      # and replace the value in d if it's not -1
    for i in range(0, 3):
        n = v[i]
        c = self.ptc.dp_order[i]
        if n >= 0:
            d[c] = n

    mth = d['m']
    dy  = d['d']
    yr  = d['y']

Pretty nifty if I may say so myself :) but I want to run it by people like Phillipe (pje) to really find out if it’s a good python algorighm.

Bear's Log

a journal writ one beer at a time

making parsedatetime data-driven when parsing dates

Mentions