Data Dynamics in Semantic Systems (Part I)
When people design semantic systems, then a typical architecture looks something like this:
- An RDF tuple (or Topic Maps) store for the odd and irregular data, and
- some relational DB, either imported into the semantic store, or wrapped, or linked via a message bus (MQ, events, ...).
- Some more or less sophisticated integration, and
- the user interface on top of it.
Now this is all well and good for your middle-of-the-road semantic portal, but the class of applications I have in mind have one thing in common:
Data dynamics, with temporal and geospatial aspects. And that with physical units.
Formula 3 Middleware
As part of a larger project (SANY, Sensors Anywhere) the folks in Seibersdorf (AIT, Austrian Institute of Technology) and I have worked on a middleware which should take away some of the pain developers of typical geosemantic applications have:
Dealing with (semantic) time series. Lots of them. And for all the computations involved to produce new time series from existing ones.
Time Series Processing
In our interpretation, a time series is simply series of values, each sitting in its own time slot with a time stamp. Along with the value you can also store any meta data.
One meta datum is a key/value pair, so you can later interpret it later as RDF property, or TM occurrence. Or just leave it meta data if you do not need a semantic network at all.
Once you have one or more such time sequences you can inject them into a processor (cleverly called a time series processor, TSP). That TSP is controlled by a script written in the language F3 in the same way as an XSLT processor by an XSLT style sheet.
On the outgoing side such a TSP will produce one or more time series.
As an example, let us assume we had a time series with temperatures (first here without any unit):
If we wanted to filter out those values which are above 25, then the following Formula 3 expression could do just that:
The [n] symbolizes one value in the time series. The brackets <> imply that the whole time sequence is iterated through, i.e. each individual time slot is visited.
Inside the <> is a typical if-then-else-otherwise construct. The only variation from the Pascal/C/Fortran/Java syntax is that the condition is following the value. Mathematicians prefer this notation, but we know them to be weird people.
In effect, the processor will iterate over the sequence, following maybe, but not necessarily the natural order. For each time slot the value will be compared with 25. If it is bigger, then the value will be echoed onto the outgoing side. If not, it will be silently ignored.
Not only the value will be echoed: Also the timestamp will be carried over, and so will in this simple case any meta data attached to the time slot.
Data cannot be trusted, and this is particularily true for measurement data. One safeguard against faulty values is to estimate them if they look suspicious enough.
Let us have a look at the following TSP:
or [n] otherwise >
Again it is using a condition to test a value against. If the value exceeds that limit, a new value will simply be estimated from the previous and the next value in the sequence, symbolized by [n-1] and [n+1], respectively.
If the value is within the bounds, it will be copied over to the outgoing sequence.
And The Semantics?
While much can (and will :) be said about the mechanics of the language, in the next installment I will show you how meta data can be properly attached to values and how F3 can help you to consistently propagate semantic information forward.
Because this is my general agenda: To have something which helps me to cope with quantitative and with qualitative data in one go.
And behind that is an even bigger agenda. But one thing after the other.