CTM, TMQL: Let's define 'undef'

One of the TMQL issues is about the role of undefined and its relationship to null in TMQL.

Here I would like to elaborate a bit on the background why null and undef are two different things and why I think that undef should not only be in TMQL, but actually also in CTM.

TMQL Connection

Let us consider a simple query:

select $p / homepage
where
   $p isa Person

Here the TMQL processor is instructed to find all instances of Person, bind these one by one to the variable $p and then find for each person all its homepage occurrences.

If a person had exactly one homepage URL, then we would see that in the result list. If he had two, we would see both. If he had none, we would see nothing. But that may be exactly what we wanted in the first place.

If we would like to record the person's name as well, we can extend the query to:

select $p / name, $p / homepage
where
   $p isa Person

Again, the same mechanism of grouping is used: for every instance of Person the expression

$p / name, $p / homepage

is evaluated. If we for the moment assume that every person has exactly one name, then for persons with exactly one homepage we will also see exactly one result tuple, say:

"The Lying Rodent", "http://www.john-howard.com.au/"

If Mr. Howard had two homepages, then we would get

"The Lying Rodent", "http://www.john-howard.com.au/"
"The Lying Rodent", "http://www.tampa-incident.gov.au/"

But if the most honorable ex prime minister had not a single homepage, the list would be empty, and we would see no trace of him.

Maybe a Howard-free result is what we wanted. But maybe we would like to record this very fact differently:

  • "If there are homepages, then show them. But if the list is empty, then use something to indicated that there is, uhm, something."

This is where undef steps in. It can represent the "something is there, but I cannot say what it is" part:

select $p / name, $p / homepage || undef
where
   $p isa Person

Above I used it together with the || operator. That takes care of the "if there is something, use that. If not, use the expression on the right" part.

With no J.H. homepages we would then expect:

"The Lying Rodent", undef

In Perl undef would be modelled as undef (what a coincidence!), in Python it is None, C++ has NULL and in Jaaaaaavaaaaaa it would be null, or so.

Of course, you could also use a particular string as in

$p / homepage || "none known"

but then everyone who gets the results must know how to recognize this very string. undef is a standardized way to flag undefinedness.

undef is not null

TMQL also has a special constant null.

That is a synonym for () and basically means nothing there. And Nothing there is not the same as something there.

CTM Tangent

Suppose we need to write a topic map about persons and would record their homepages there.

What would you do to record "there is a homepage, I am sure, but I do not (yet) know what it is"? The known unknowns, so to say.

CTM does not yet allow to write

john-howard isa Person
homepage: undef

But maybe it should:

  • You cannot leave the occurrence value blank (or can you?), and
  • if you omit the homepage occurrence altogether, that would not reflect your intentions. And a TMCL validator enforcing at least one homepage occurrence would complain.

Work supported by the Austrian Research Centers.

Posted In

undef in TMDM

As CTM describes TMDM instances, what is the TMDM representation of an occurrence with a [value] field set to undef?
What are the merging rules?

Is

john-howard isa Person
  homepage: undef
  homepage: "http://www.john-howard.com.au/"

merged into
john-howard isa Person
  homepage: "http://www.john-howard.com.au/"

?

Note that while you cannot have undefined (in the sense of "its existences asserted, but else not described any further") occurrences, you certainly can have undefined topics playing a role in an association. So

has-homepage(who: john-howard, homepage: some-homepage)
some-homepage isa homepage

is actually possible.

Which is actually a little bit inconsequential, as TMDM 5.6 says "Occurrences are essentially a specialized kind of association".

Maybe we should think about a standardized unfolding operation, unfolding a TMDM instance with names, occurrences and associations into a TMDM instance with just occurrences and associations, and unfoldung a TMDM instance with occurrences and associations into a TMDM instance with just associations.

The only way to actually embed strings into such a TMDM instance is by using subject identifiers (or subject locators), which may not be a bad thing per se, because a topicmap engine needing to index only one type of string may be simpler (and even faster) than a topicmap engine needing to index many different types of strings.

This way of unfolding might provide a smooth(er) path between fully fledged TMDM and TMRM.

Xuân Baldauf (not verified) | Thu, 04/03/2008 - 14:15

Re: undef as data type

As CTM describes TMDM instances, what is the TMDM representation of an occurrence with a [value] field set to undef?

The idea is that undef would be one single value in a newly create data type, say, UNDEF. So from the TMDM viewpoint it is just a value.

What are the merging rules?

Because of the above, I do not see the necessity to change anything in the existing merging rules.

In this sense, the fragment

john-howard isa Person
  homepage: undef
  homepage: "http://www.john-howard.com.au/"

cannot be merged further down.

rho | Thu, 04/10/2008 - 14:06

Maybe we should think

Maybe we should think about a standardized unfolding operation, unfolding a TMDM instance with names, occurrences and associations into a TMDM instance with just occurrences and associations, and unfoldung a TMDM instance with occurrences and associations into a TMDM instance with just associations.

This is exactly what my Perl TM system does. It generalizes the association structure so that the role player are not only topic but also literals. And then every thing can be mapped into these generalized assertions.

If we standardize something like this, it would go into TMDM 2.0. I'll put it onto my wishlist :-)

rho | Thu, 04/10/2008 - 14:25

Re: strings as subject identifier

The only way to actually embed strings into such a TMDM instance is by using subject identifiers (or subject locators), which may not be a bad thing per se, because a topicmap engine needing to index only one type of string may be simpler (and even faster) than a topicmap engine needing to index many different types of strings.

I would not be too hapy with creating topics out of strings.

This way of unfolding might provide a smooth(er) path between fully fledged TMDM and TMRM.

It would reduce the number of steps from 100 to 99! :-)

rho | Thu, 04/10/2008 - 14:46