the joys of parsing

Over the last couple of months I’ve heard different people in different contexts talking about wanting a python routine to parse time text. Evidently the routines to parse text like “5 minutes from now” or “next wednesday” are not found in the python world - some of it can be found in other languages. I have plenty of code that parses time and date text from work - all of it proprietary and in Delphi :( But that allows me to have fun porting it to Python! So I started working on timeparse.py and currently it handles the basic of formats and I’m slowly working it to handle more every night ;) Parsing human readable time and date code is a fun mix of styles - you need to extract not only the literal values but you also have to infer patterns just by the ordering and/or relation. For example: “next week” seems very obvious to a human but is missing a ton of information for the computer - all of it the human infers. To handle that, and others, you need to pick sane defaults and also recognize certain edge cases. Now I’m not an academic, so you won’t find me going thru the differences between this style of parsing or that – but I figure that I should post some thoughts as I work them out. anywho, more info tomorrow - I just realized it’s 0200 hrs!


Mentions