Semantic Time Series (Temporal)

Last two months I was busy completing some exploratory work on semantic time series. As I mentioned earlier, this is all part of capturing sensor observation values and derivative computations in a semantic network.

Temporal Processing

The temporal aspect of handling time series is handled now by a special processor, which - hey you should know me! - is controlled by a DSL (domain specific language).

Here is one example:

%A %B <|  A[n] + B[n] |> @ every tick

This operator, when applied to two time series will compute the sum of corresponding values.

A and B are the formal parameters. They will be associated with a whole time sequence of values when the operator is invoked. The rest of the syntax then generates a new sequence by adding up corresponding values in A and in B.

As the syntax insinuates, there are many things you can - or must - control, one of them being specific time patterns (e.g. every 2 minutes) or also semantic annotations (this has been measured with device D at location L). More about that later.

Pipelines

What I found intriguing is how such operators can be combined into larger pipelines. Here is one with which I want to compute a more intelligent (semantic) visualisation of how individual stories in my blog evolve traffic-wise:

my $logs = new F3::TS::Accesslog ('/apache/access.log');
$logs->apply ('< | [n] if interesting | >
               | Classify
               | Speed { interval => hour }
               | Tvisualize
'
);

The first line obviously reads the Apache log file and interprets it as time sequence. As value the name of the downloaded file is used. Everything else (size, MIME type, ....) is meta data. Each data entry carries the timestamp of the download.

The apply method allows to execute a pipeline of operators on this sequence. The first operator filters out those log entries which are not interesting (images, CSS, JavaScript files). The property interesting I had to register before.

The output is further piped into Classify. That classifies (sic!) the values and creates a new subsequence for every different value found. If there were 200 different blog stories, then there are now 200 parallel time sequences.

For each of them (that is part of the processor semantics) the next operator is applied. It will compute the speed, i.e. the number of requests per hour. This will result in 200 sequences, each being now the speed of one blog entry at every hour.

The visualizer will then select an appropriate form to present this to a user. This is still wishful thinking on my side.

Compiling into MapReduce

All above operators (except Tvisualize) can be rewritten using more basic operators, so that the above is eventually compiled into:

$logs->apply ('< | Ngrep { lambda => n.is_interesting } | >
                 | Nfork
                 | ( Tfork { beat => @ every 3600 secs,
                             window => 3600 secs }
                   | Nreduce { lambda => count }
                   | Nmap { lambda => [n] / 3600 }
                   | Tjoin
                   )
                 | Tvisualize
'
);

If you have followed my thoughts on integrating MapReduce into Perl, then you know where this is heading.


Work supported by the Austrian Research Centers.

Posted In