Unhappy With SPARQL

At the moment I'm spec'ing out semantic extensions for an algebraic language for an environment monitoring system. Sensor endpoints send raw measurement data into a concentrator. There these values are value-added, i.e. checked for plausibility and flagged appropriately, maybe re-measured after a new calibration cycle. At a central server the raw data is then aggregated into more abstract concepts, such as a "sliding mean value over the last 6 hours", or "ozon strain on plants". This is where the algebraic language kicks in. It operates on time series.

In the course of the processing, considerable meta-data can be accrued, starting with the information how a value has been actually measured or computed (the procedure), up to geographical information where and when all that happened.

This is not just comment to be archived. The provenance and disposition of a value influences all further computations, and of course with that the interpretation and visualization within end-user interfaces.

At the moment, much of this is hard-coded, but if you wanted to use the meta-data (actually it is a value embedded into a small semantic cloud) more declaratively, then you needed a very concise query language. To write this, for instance:

"Give me the URL for the manual for the station with which these particular value(s) have been measured.
"The value is valid if the sensor is Model 7911 and the last calibration date is not longer than 30 days in the past."

Yes, it is no problem to create this cloud in RDF, and yes, it also seems to be no big problem to formulate these queries in SPARQL:

WHERE { ?value sml:measured-with [ swe:user-manual ?url ] }

and, respectively

ASK WHERE { ?value sml:measured-with [ rdf:label "7911" ] .
            ?value sml:measured [ mon:lastCalibration ?caldate .
                                  mon:measurementTime ?date ]
            FILTER ( op:dayTimeDuration-less-than(
                         op:subtract-dateTimes(  ?date, ?caldate ),
                         "P30D"^^xsd:dayTimeDuration )

The first one can be regarded as one-liner, the second not really. Apart from the messy dateTime handling, choosing ASK to return a boolean is a bit like walking on thin ice: Should the requirements of "being valid, or not" change subtly to "valid, suspicious and invalid", then I cannot ASK anymore. Worse, SPARQL does not allow me to have arbitrary value expressions within the SELECT clause, so that does not work as escape route:

SELECT <something here>
          ? "valid"
          : <something else>
            ? "invalid"
            : "suspicious"
   WHERE ....

I guess what I am missing from SPARQL is the more elegant, functional approach path languages such as Uche Ogbuji's VERSA had. But obviously their syntactic conciseness was dismissed at the time by the proponents of a more logic-oriented semantics and a more mainstream SELECT-ish syntax.

So what are my options?

  • Use RDF and add another layer to SPARQL which allows me a path-expression like syntax. That is not too difficult.
  • Or use Topic Maps and TMQL in the first place. Using path expression syntax works there well, although these may appear cryptic to innocent Java users:

. <- [ ^ sml:measured-with ] -> device / user-manual


. [ . <- [ ^sml:measured-with ] -> [ . / model == "7911" ] ]
  [ . <- [ ^sml:measured ]
    ( . -> mon:lastCalibration, . -> mon:measurementTime )
    [ $1 - $0 < "P30D"^^xsd:dayTimeDuration ]

And I always have two escape routes when the specification changes:

  • Add more postfixes at the end for post-processing. TMQL allows that.
  • Fall back to FLWR syntax if I really need variables for a shorter query expression.

And, hey, we can still change TMQL itself!

Work supported by the Austrian Research Centers.

Posted In