TMRM Exegesis: Proxies

Those of you who got curious from my ultrashort intro into the Topic Maps Reference Model may want follow me here when I write a bit about my interpretation of the TMRM.

Best is to forget topics, names, occurrences and associations as defined in TMDM for the moment. TMRM abstracts all of these into one, atomic concept: the proxy. We will return to TMDM later.

Here again is the proxy spider:

http://kill.devc.at/system/files/spider-small.jpg

A proxy always represents a statement about the subject world and in the manner it does that it incorporates much what the TM paradigm itself is all about:

  • A proxy arranges any number of subjects (actually their proxies, of course) into one larger statement. These are the values v1, v2, v3 and v4 above. The values can be proxies for other subjects. Or they can be primitive values coming from some predefined data set, such as INTEGER or STRING.
  • To make it unambiguously clear which role each value plays in such a statement, the keys k1, k2, k3 and k4 are used. Keys can only be proxies, never primitive data values.

Consequently a particular proxy can be involved either as value or as key in another proxy. And one and the same proxy can be involved in any number of other proxies.

Labelling

For technical reasons (infinite data structures come with some theoretical problems), the TMRM introduced labels for proxies. It is actually them, and not the proxy itself which is used as key or as value. The only assumption here is that

  • every proxy gets such a unique label stapled onto its forehead, and
  • if you hold such a label in your hand, there is some dereferencing machinery to give you the appropriate proxy.

Implementing Proxies

If you wanted to implement such a structure in, say, Ruby then your first attempt is to use a hash (aka dictionary):

{
  shoesize => 42^^xsd:integer,
  beard    => "long",
  knows    => rho
}

shoesize, beard, knows and rho would be labels for their respective proxies, the literal 42^^xsd:integer the convoluted way to express the integer 42 and the literal "long" some string.

The only problem with that is that these hashes allow to use one specific key only exactly once, whereas TMRM is without that limitation. It would allow perfectly well:

{
  shoesize => 42^^xsd:integer,
  beard    => "long",
  beard    => "white",
  knows    => rho
}

There might be a number of clever ways to represent this in memory. I ended up to abandon hashes (they are quite memory-intensive anyway) and encoded this as two parallel arrays:

[ beard,  beard,   knows, shoesize ]
    |       |        |        |
    |       |        |        |
    v       v        v        v
[ "long", "white", rho,   42^^xsd:integer ]

As you can see, I have sorted the whole thing according to the keys (how precisely does not matter), so that I can reliably create a hash value from all the above. That MD5 value I use as label for the proxy. This definitely speeds up proxy comparison.

Bootstrapping and Equality

Two proxies are the same if they have exactly the same properties, i.e. key/value pairs. If you add a new property, this will become a different proxy. Whether the two are still to be regarded the same can only be decided on a semantic level. This is where merging will kick in later.

If you take away properties from a given proxy, then also here you just generate different proxies. Until you reach a proxy without any keys, i.e. the empty property set. Mathematicians like to call such a thing bottom, because it marks the simplest thing possible.

It may appear that such a proxy is completely useless as it never carries any information. Quite to the contrary, the bottom element is useful and actually necessary to build other primitive proxies.

Returning to the example above, we might want to define the proxies for beard or rho:

beard = { bottom => 1 }

rho   = { bottom => 2 }

In that, we used the always existing bottom proxy together with some unique values (integers are just fine for that purpose) to disambiguate any number of proxies. Apart from being different there is not much flair about them. But at this moment this is maybe not our main concern; we can add properties to them later.

No Types, No Scope

You will also notice that the TMRM does not include any special features into a proxy, such as typing or scoping information. From TMRM's point of view, all such concepts are imposed by a particular interpretation of the TMRM framework. Such an interpretation is defined in a legend, and that defines how typing information (among other things) is actually represented.

And, yes, TMDM is such an interpretation.

AttachmentSize
spider-small.jpg3.1 KB
Posted In

Just one thing

How do we determine whether a proxy key or value is another proxy label? Or are we meant to cross-check every single key and value against possible proxy labels? (Seems like a waste both in terms of performance and cycles that every name or occurrence value [to just pick two TMDM concepts] needs to be checked for being a possible proxy label ...)

Alexander (not verified) | Fri, 12/14/2007 - 14:36

Re: Just one thing

How do we determine whether a proxy key or value is another proxy label?

The TMRM makes sure that proxy labels come from a dedicated set, so they are well distinguishable from other values, such as INTEGERs or DECIMALs.
Then, just for clarification of the above, proxy keys MUST always be proxy labels.

Or are we meant to cross-check every single key and value against possible proxy labels?

I am guessing here that your concern is how to move from a proxy a to all other proxies b where a is used as key or as value? Please clarify!

Seems like a waste both in terms of performance and cycles that every name or occurrence value [to just pick two TMDM concepts] needs to be checked for being a possible proxy label ...

Not sure what you mean here. It is true that each TMDM name and each TMDM occurrence is mapped into one (actually more) proxy. But from a performance perspective the task to find your way through that maze simplifies. You only need indices for a limited number of value combinations. TMQL implementations will make heavy use of these axes.
rho | Fri, 12/14/2007 - 16:28

Re: Just one thing

Well, you state that all keys are proxy labels, and that's fine; there's ways to index aqnd use these to find graphs. But you also state that a value *can* be a proxy label, so I'm interested in the mechanism for doing that efficently and without calculating checksums for ever possible value your systems might have (which often can be more than keys and labels alltogether.)

You state that the proxy labels come from a defined set, which again is fine, but wouldn't this mean that you need to cross-check all proxy labels against all values to find out that relationships?

Alexander (not verified) | Fri, 12/14/2007 - 23:39

Re: Just one thing

But you also state that a value *can* be a proxy label, ....

What TMRM says (I hope /me too) is that labels are just a special sort of values, yes.

... so I'm interested in the mechanism for doing that efficently and without calculating checksums for ever possible value your systems might have (which often can be more than keys and labels alltogether.)

To answer that, one has to consider what one actually wants to achieve. The model (TMRM) itself should only make implementations possible, but should not say anything about them.

So let us assume that there is a proxy

{ shoesize => 42, person => rho }

shoesize, person and rho are proxy labels and 42 obviously one integer.

If you started with an integer value and you wanted to know all statements (proxies) where this value is used as shoesize, then you would write (TMQLish) 34 <- shoesize .

To make this navigation step fast you could create an index over the map which maps integer values to proxies using that value as shoesize. Very fast because dedicated. You can then generalize your index to do this not only for shoesize, but for other keys as well.

You state that the proxy labels come from a defined set, which again is fine, but wouldn't this mean that you need to cross-check all proxy labels against all values to find out that relationships?

This is all implementation. If you implement a proxy as an in-memory data structure, then the label might be the C++ pointer to the proxy. And then a lookup is as fast as the page is swapped into memory. If you implement proxies in a flat text file, then you will have to parse and scan through the whole file with every label dereference. Works, but very slow.

I'm still not sure whether I have answered your question.

rho | Sat, 12/15/2007 - 14:10