libtmrm2, Another C-library for TMRM
Ever since I met Jan Schreiber in Oslo a while back I had this itch that someone should write a C library for TMRM. Not only to host maps in proxy form, but more importantly to have a fast evaluation mechanism for myTMQL path expressions.
I was waiting for an opportunity to pack this into one of the paid projects, but that opportunity never arose. Which left me with only small time windows to take a stab at it.
Jan's and Reidar's libtmrm
What helped was an alpha release of libtmrm. I took a hard and long look at it to see how far my ideas deviated from Jan's and Reidar's approach. But in the end decided to go my own path. Also because I have my own agenda and don't want to impose this onto them.
libtmrm's API is quite big, relative to the size of TMRM itself. Here I really tried to keep the amount of functions low, especially have no casting or conversion functions. What helped here was to stick to the mathematics and to shift some burden by using C++ instead of C.
No Store, No IO
I refrained from offering any backend storage, so all data currently lives in memory only. It is the responsibility of the (Perl/Python/...) program to acquire (or parse). My reasoning is that I already have much IO functionality in my Perl packages. No need to repeat that exercise in C.
This does not mean that I will not follow libtmrm's tracks at some point and implement a persistent store on BerkeleyDB, or more probable on top of AllegroGraph.
Lazy & Eager Evaluation
Whenever in the API a list is returned as function result, I will offer always an eager and a lazy variant. The eager solution will instantly materialize the list. With that I can perfectly satisfy language bindings to Perl, Python, etc. For more functional languages, or those working with iterators I offer the lazy variant.
That combined with pooled memory allocation seems to work quite well.
Apart from the map/proxy stuff, TMRM offers also some functionality in terms of merging. Here I have only implemented the minimal merging with that bowtie function. Remember that function would take two proxies and returned either NULL if they will not merge, or a new merged proxy if they do merge.
Also the taxometric part (isa, iko, ....) I implemented, but only to some extent, and not in an overly performant way.
Path Expression Evaluation
What I was most interested in, of course, was to implement the TMRM Annex A (Path Expressions). What I effectively had to do was to port the existing tuple machinery in TM::QL::TS to C. As Perl and C are not so different in that procedural aspect, this was rather straightforward.
More effort was to port TM::QL::PE to C. That involved the tree-like data structure for the path expressions themselves, but also the optimizer.
What that does is to simplify expressions by trying to apply optimization rules. As I have written before, I had been foolish enough (in 2006) to implement this optimizer (a pattern matcher effectively) in pure Perl. A much better approach is to do the following:
- take the path expressions as they come out of the TMQL parser,
- convert this tree into a DOM tree,
- apply XSLT sheets which have these transformation rules encoded,
- convert the result DOM tree back into a path expression tree,
And evaluate that one. Getting all the rules into XSLT sheet form is a painful exercise, though. One which I have not followed through yet.
So far, this has been mostly a fishing expedition to test the viability of the whole setup. I will have to wait for another time window in between projects to walk through the whole code base, add more test cases, and to see how the bindings towards Python, Perl and Ruby will pan out.
And there is always hope that an appropriate project will pop up. And the hope dies last.