Perl TM Tutorial: As Low as it Gets (Part I)

This section covers the lowest-level data structure of the Perl TM package. That layer may not be overly comfortable, but is useful to know if you plan to do something serious with the distribution. Such as virtualization.

Creating a Topic Map

To create an empty topic map, there is really not much to it except loading the module and using a constructor:

use TM;
my $tm = new TM;

The map is actually not really empty. The constructor will also load the infrastructure, i.e. some necessary topics and associations. Necessary, for instance, are isa and is-subclass-of and the involved topics instance, class and subclass, superclass respectively.

But also some associations are preloaded, such as one which claims that an occurrence is a characteristic, and that so is a name. And also that a characteristic is nothing else than a specialized association.

If you were curious (or close to despair during debugging) then you might want to look at the data structure itself:

use Data::Dumper;
warn Dumper $tm;

Low-Level Data Structure

It would show you there are two hashes, one containing topic-like information, one containing association-like information.

The Topic-like information consist only of addressing information, in fact only of

  • the subject address (aka locators, if existing),
  • subject indicators (aka identifiers, if existing),
  • and an internal identifier. That always exists for each toplet.

The TM packages call these the toplets. Assertions are everything else in the map:

  • type information
  • subclass information
  • names and occurrences
  • and general associations

We will deal with assertions in a later installment.

Toplets

To learn about all predefined toplets you can use the method with the same name:

my @all = $tm->toplets ;

It will return the complete toplet structures, which may be more than you want. To find only their local IDs (LID), you project simply that component:

my @all = map { $_->[TM->LID] } $tm->toplets;

The method also understands a simple query language which enables you to specify what you want:

my @mine = $tm->toplets (\ '+all -infrastructure');

As we have not added any toplet ourselves, that list will be empty. But not for long. Let us add one toplet:

$tm->internalize ('cat');

and have a look at the list again:

warn Dumper [ $tm->toplets (\ '+all -infrastructure') ];

[
  [
    'tm://nirvana/cat',
    undef,
    []
  ]
];

The output shows that cat has been used as part of the local identifier, prefixed by tm://nirvana/. That string is the baseURI of the map. The nirvana stuff is used by default, unless you care to provide your own prefix at topic map creation:

my $tm = new TM (baseuri => 'http://zen.s.or/');

The subject address component of the cat is undefined; and empty is also the list of subject identifiers:

my ($t) = $tm->toplets ('tm://nirvana/cat');
warn Dumper $t->[TM->INDICATORS];

Above we used toplets again, but this time by providing the full LID (local identifier) of what we wish to get returned.

Adding the base URI manually is of course cumbersome, so there is another method which tries to figure out the toplets LID:

my ($t) = $tm->toplets ($tm->tids ('cat'));

If no such toplet existed, we would get undef returned.

Subject Identifiers and Addresses

To add subject identifiers and/or subject addresses to the toplet, they can be provided to internalize as string references:

# two subject indicators
$tm->internalize ('cat' => \ 'http://en.wikipedia.org/wiki/Cat');
$tm->internalize ('cat' => \ 'http://sigma.ontologyportal.org:4010/sigma/Browse.jsp?lang=EnglishLanguage&kb=SUMO&term=DomesticCat');

If my cat, the famous Sacklpicka, had a URL, then

$tm->internalize ('sacklpicka' => 'http://sacklpicka.devc.at/dev/brain');

would be a perfect solution for a subject address. But - against all rumors - Sacklpicka is not online yet.

Sacrosanct Identifiers

If you browse through the map data structure you will notice that while cat and sacklpicka have been prefixed, other identifiers such as isa or class have not.

[
   'tm://nirvana/sacklpicka',
   undef,
   []
],
[
   'isa',
   undef,
   [
     'http://psi.topicmaps.org/sam/1.0/#type-instance',
     'http://www.topicmaps.org/xtm/core.xtm#class-instance'
   ]
],
....

These identifiers are called sacrosanct as they represent fixed concepts from the Topic Maps definitions. You can find all these identifiers in (the source of) TM::PSI.

Posted In

SIDs vs. SLOs

Wouldn't it be more comfortable if subject locators require the "\" prefix (IIRC it is the Perl notation for a reference)? Subject identifiers are more common, so you'd have to type less.
Further, it would be consistent with AsTMa=, TMQL, and CTM where an IRI without any additional information (such as prefixes) is interpreted as subject identifier.

Maybe a critical "API" change, tough.

Regards,
Lars

Lars Heuer (not verified) | Fri, 03/28/2008 - 14:57

Re: SIDs vs. SLOs

Wouldn't it be more comfortable if subject locators require the "\" prefix (IIRC it is the Perl notation for a reference)?

That's really scary: When I wrote that, I was thinking "I bet lheuer would want it exactly the other way round".

But, yes, that train has left the station. Definitely version 2.x stuff.

rho | Fri, 03/28/2008 - 15:20