Perl TM Tutorial: As Low as it Gets (Part IV)

(Followup to part III)

Before we move to higher-level aspects, there are two important functionalities which deal with topic maps as a whole: merging maps and taking the difference of two maps. Lets look at the latter here even though it might still be regarded as somewhat experimental.

Map Differences

One of the positive side-effects of the extremely flat data structure is that not only it is possible to define difference succinctly, it is also rather efficient to produce.

The function diff operates on two maps and can deliver a data structure representing the differences. While it can work on any two maps, the effect is best demonstrated if the maps do not differ too much.

So let us quickly produce two almost identical maps:

sacklpicka (cat)
bn: Der Sacklpicka
bn: Catbert

rho (person)
oc (blog): http://kill.devc.at/

whitey (unperson)
bn: The White-Haired Man

sacklpicka (cat)
bn: Der Sacklpicka

rho (person)
bn: \rho{}bert
oc (blog): http://kill.devc.at/

sharky (unperson)
bn: Sharkbert

Relative to the first map, the second map has lost one name for sacklpicka but has gained one additional name for rho. The whitey topic only exists in the first map, as the sharky only exists in the second.

A line-diff shows the changes

sacklpicka (cat)        sacklpicka (cat)
bn: Der Sacklpicka      bn: Der Sacklpicka
bn: Catbert           <

rho (person)            rho (person)
                      > bn: \rho{}bert
oc (blog): http://kil   oc (blog): http://kil

whitey (unperson)     | sharky (unperson)
bn: The White-Haired  | bn: Sharkbert

(Are not line-oriented notations cool?)

If we now had loaded the maps into $tm1 and $tm2, respectively, then the difference

use Data::Dumper;
$Data::Dumper::Indent = 1;
warn Dumper $tm2->diff ($tm1);

would show:

  • how must $tm1 be modified to arrive at $tm2:

Characteristics

If a name or an occurrence changes, then this will be listed under the modified trunk:

'modified' => {
    'tm://nirvana/sacklpicka' => {
      'minus' => [
        '817da726e9ef573407520be7861f9179'
      ]
    },
    'tm://nirvana/rho' => {
      'plus' => [
        '4962a6d33436a7359c333366d33c19bd'
      ]
    }

Accordingly, sacklpicka will loose one assertion (this is what the minus means) holding a name and rho will get one additional assertion holding another.

Topics Come and Go

When complete topics disappear or emerge going from one map to the other, they will be mentioned separately:

'plus' => {
    'tm://nirvana/sharkbert' => [
      '4f630424825d71832ec3abb49a99fab5',
      'e0bd658e93988c7de5710a9baa9a786b'
    ]
  },
  'minus' => {
    'tm://nirvana/whitey' => [
      '918e8b22537b7286640384db87d0a072',
      '137b04603d9e823e8278cc02bdb5b7de'
    ]
  },

Both, sharkbert and whitey are listed both with 2 assertions: One holds their respective names and the other the fact that there is an relationship with unperson.

Getting the Details

The assertions are only indicated via their internal identifiers, and these are only meaningful in their respective maps.

You can look them up yourself (using the retrieve method), or you can ask diff to attach them for your convenience:

$tm2->diff ($tm1, { include_changes => 1 } )

They will appear under an assertion key of the hash structure.

But there is more: also toplets involved in any changes will be listed (under plus_midlets and minus_midlets).

Return Ticket

Finally I should also mention that the whole thing also works the other way round as well:

warn Dumper $tm1->diff ($tm2);

So there should be no loss of information and one should be able to roundtrip between two maps.

Posted In