Data Dynamics in Semantic Systems (Part II)

(continued from Part I)

Last time I implicitly proposed to think some parts of a (geosemantic) application in terms of time series. This is not so farfetched, consider for instance a semantically enabled tourism application for, say, Vienna.

Sure, there are a number of very static things you would store into the semantic network:

  • the sites, churches, cathedral, churches, and even more churches,
  • the museums, galleries, museums and even more museums,
  • the tourism ontology, containing buildings, museums, and yes, the churches.

But even if this is Vienna, not everything is static: There are (insane) traffic conditions, (predominantly italian and spanish) tourists roaming through the city, concert tickets sold at the weirdest places. All these are perfect candidates to be packed into time series.

Ticket Sales

Let us assume we had an eye on the ticket sales for the Schönbrunn Palace. And unsurprisingly the data is delivered into your fancy application as mundane CSV file:

Date;                 ID;"Value";Code
10/03/2008 12:10;T000005;       ;A
10/03/2008 12:20;T000006;       ;C
10/03/2008 12:30;T000007;       ;Z
10/03/2008 12:40;T000008;15     ;G
10/03/2008 12:50;T000009;       ;A
10/03/2008 13:00;T000010;       ;G

While the date of the sales event and the ID of the ticket are easily recognizable, the Value and Code column all carry implicit meaning. It is the typical exercise for semantic systems to make all implicit stuff explicit:

  • The codes stand for adult (A) ticket, child (C), concession (Z) and group (G).
  • And only if it is a group ticket, the value is set to the size of the group. Otherwise it defaults to 1.

Semantic Uplift

When you load the CSV into F3, then it will at best organize one time slot into something like this:

value: ""
  datetime : 10/03/2008 12:10
  ID       : T000005
  Code     : A

You can try to tweak the configuration of the CSV loader, but much easier and obvious is it to clean up this mess with F3 itself.

For instance to make the default value explicit:

<    1          if [n] = ""
 or [n]         otherwise    >

If such a TSP is confronted with our ticket sales time series, then it will emit the value 1 if it is missing and pass it through otherwise.

As mentioned earlier, all meta data (time, ID, Code) is passed on unperturbed. (But only in this simple case. This is a longer topic.)

Being good semantic samaritans we also want to get rid of the codes and replace them with something more meaningful:

<
  [n]
  { rdf:type => tou:adult-ticket      if [n].Code = 'A'
             or tou:child-ticket      if [n].Code = 'C'
             or tou:concession-ticket if [n].Code = 'Z'
             or tou:group-ticket      if [n].Code = 'G' ,
    tou:ticket-ID    => [n].ID
  }
>

This time we left the value [n] itself unchanged, but explicitly have specified meta data using the {} braces:

  • One key/value pair is holding the type of the ticket. Here we take the QName from the RDF namespace, and assume a tou namespace to hold our ticket types. Otherwise the if-or-otherwise structure is the same as before.
  • And for the ticket ID we also use a more explicit QName and simply copy over the value with [n].ID.

But our work is not complete. We not only need to tag the individual observation values with proper semantic information; also the time sequence as a whole should be tagged:

<
  [n]
  # same as before
> {
     f3:phenomenon => tou:ticket-sale,
     tou:location  => tou:schoenbrunn
  }

The meta data attached to the time series signals two things:

  • Firstly that the phenomenon which is observed is tou:ticket-sale. The key f3:phenomenon is understood by the F3 middleware. This makes it easier later to find time series.
  • And, secondly, we also need to manifest that our observations are made at a certain location.

Putting it Together

So now that we have these 3 operators, how can they work together? In our application they have to be pipelined, and for the pipe plumber enthusiasts among us, the | pipeline symbol is the obvious choice to chain the operators.

<    1          if [n] = ""
 or [n]         otherwise    >
|
<
  [n]
  { rdf:type => tou:adult-ticket      if [n].Code = 'A'
             or tou:child-ticket      if [n].Code = 'C'
             or tou:concession-ticket if [n].Code = 'Z'
             or tou:group-ticket      if [n].Code = 'G' ,
    tou:ticket-ID    => [n].ID
  }
>
|
<
  [n]
  # same as before
> {
     f3:phenomenon => tou:ticket-sale,
     tou:location  => tou:schoenbrunn
  }

Apart from the pipe symbol |, the language F3 also understands & to stack operators on top of each other. You will see some examples later. And you can also group operators with (). And there is also some magic involved when doing one and the same thing with many time series.

Property Management

There are also a number of clever defaults and little tricks to make that meta data and property management less painful. I will return with that in a little while.

Posted In