Wolf in Sheep's Clothings (Part I)

The other week at the Triple-I conference Andreas Blumauer mentioned to me AllegroGraph as a product which can do geospatial and temporal reasoning. I was not deterred by their criminally 90'ish web site (Update 4.4.09: there is now a new flashy design!) and downloaded their free Java edition with an upper limit of 50,000,000 triples.

http://kill.devc.at/system/files/wolf.jpg

The company has a long and strong background in developing Lisp engines and in fact Common Lisp not only is still one of their products, it is also at the base of the AllegroGraph server. That is basically a graph database, one which can store nodes and edges. Lots of them. All organized in quadruples, together with appropriate indices.

To position their product in the Semantic Web market, however, they have added an RDF layer, added what they call RDFS++ reasoning (more about that later) and SPARQL support. And there is also some support for geospatial and temporal information. But that I cover in another installment. Here first some basics.

Starting the Tuple Store Server

The first positive surprise is when you start the server:

$ ./AllegroGraphServer --port 4567 --http-port 1222

Not only is it up and running instantly, it lives in only 36 MBs virtual memory with 10MB resident. Obviously no Java involved. Yet.

Client Options

While there is also Sesame2 support, there seem to be two ways to access the server via a binary (looks like RPC?) protocol. One is using a Lisp client, but as most kids nowadays have only heard about that language from their grandgrandparents, I'll better stick to the uninspired language here.

After unpacking, the Java examples live in src/com/franz/ag/examples/, or alternatively via the .jar in your IDE of the day. It is advisable to compile them all, as the documentation suggests:

$ mkdir bin
$ javac -cp lib/agraph.jar src/com/franz/ag/examples/*.java -d bin

and ignore any compilation problems.

The good message is that the Java API is not flooding you with insane 1000+ classes as seems to be today's norm in corporate programming. But the API is not all roses.

Connecting to the Server

To get a handle to the server you have to make sure to hit the correct port and then enable the connection.

import com.franz.ag.*;

AllegroGraphConnection ags = new AllegroGraphConnection();
ags.setPort(4567);
ags.enable();

The Java API designer shied away from calling the method connect. That would have been far too obvious, I guess.

You can also disconnect, uhm sorry, disable the connection:

ags.disable();

Accessing a Tuple Store

Since a server can carry any number of tuple stores you have to name one. And specify the location where it lives on disk:

AllegroGraph ts = ags.create("rumsti", "/where/store/should/be/");

Which is kinda weird, as this is all happening on a remote server. You would expect this information to be handled via server-side configuration and not shine through to the client.

Besides the method create, there are also open (for an existing store), access (open or create), renew (create and clean) and replace (clean). One may wonder what was wrong to use the POSIX open file semantics we all have been using for decades.

Ah, and if you want to close your store at the end, then it is not just close, but

ts.closeTripleStore();

probably just to enter the state the non-obvious contest.

Storing RDF

Once you hold a tuple store you can add information to your graph, or as the semantic web people call it: add triples to the model.

To make life easier, we register some namespace prefixes:

ags.registerNamespace ("rdf", "http://www.w3.org/1999/02/22-rdf-syntax-ns#");
ts. registerNamespace ("owl", "http://www.w3.org/2002/07/owl#");

Just for demonstration, one is registered server-wide, the other only tuple-store wide. From then on, things are maybe cumbersome, but straightforward otherwise:

URI CAT   = ts.createURI ("http://rumsti.org/Cat");  
URI type  = ts.createURI ("!rdf:type");  
URI CLASS = ts.createURI ("!owl:Class");
ts.addStatement (CAT, type, CLASS);

BlankNode sp = (BlankNode)ts.createBNode("_:sacklpicka");
ts.addStatement (sp, type, CAT);

and for those with literals:

URI label  = ts.createURI ("!rdf:label");  
ts.addStatement(sp, label, ts.createLiteral("Sacklpicka"));

The API also exuberantly offers several variations of all this, such as a completely different method which also gives you a handle to the triple:

Triple tr = ts.newTriple(
     "<http://rumsti.org/Cat>",
     "<http://www.w3.org/2000/01/rdf-schema#subClassOf>",
     "<http://rumsti.org/Mammal>");

If there is a reason for this, I might not want to hear about it.

In case you need to get rid of a certain statement, then there is a

ts.removeStatement (...);

at least if you know all components of that triple. If you want to keep things loose then you can use null as wildcard:

ts.removeStatements (sp, label, null);

Loading RDF from file

As to be expected, there is also a way to load triples from an N3 or RDF/XML files:

ts.loadNTriples('...');
ts.loadRDF('...');

What is a bit weird about that, is that the files must be on the server already. That conflicts somewhat with my view of a client-server architecture if I have to copy files around before being able to use them.

Once you have bulk-loaded larger amounts, it is appropriate to kick the server into indexing:

ts.indexAllTriples();

I personally find it great to have control over when and how indexing should occur. No, I mean that, really.

Retrieving Content

As many other comparable stores, you can scour for interesting triples using a template:

Cursor cc = ts.getStatements (false,
                              null,
                              "!rdf:type",
                              "http://rumsti.org/Cat");
while (cc.step()) {
   System.out.println( cc.getTriple().toString() );
}

One may wonder here why the Java convention to name the iteration next has been violated. OTOH, convention is just so ... conventional.

More expressitivity one can get with using SPARQL, of course. But the way this is working, is somewhat surprising. You first create an empty query:

SPARQLQuery sq = new SPARQLQuery();

Then you can specify whether reasoning should be used with it.

sq.setIncludeInferred(false);

Only then you provide the actual SPARQL query and you query, all in one go:

String query =
     "SELECT ?cat " +
     "WHERE {" +
         "?cat rdf:type <http://rumsti.org/Cat> ." +
     "}";
ValueObject[][] r = sq.select (ts, query);

Not sure where any optimization of SPARQL queries can go with that API scheme.

The tabular results can then be inspected manually:

String[] var = sq.getResultNames();
System.out.println("Number of solutions: " + r.length);
for (int i = 0; i < r.length; i++) {
     ValueObject[] objects = r[i];
     System.out.println("Solution " + (i+1) + ":");
     for (int j = 0; j < objects.length; j++) {
          System.out.println("  " + var[j] + " = "
                                  + printValueObject (objects[j]));
     }
}

A copy of printValueObject is thankfully included in AGUtils.java.

RDFS++ Reasoning

Whenever you retrieve statements you can control whether the store is treating your RDF information as graph (as-is) or whether it honors some inferencing rules.

There is no inferencer architecture like in Jena, and not even direct OWL support, something I can perfectly live with, but which may probably shock Semantic Web accolytes.

What you can do is to ask the server to optionally honor the semantics provided by owl:TransitiveProperty and the following properties:

  • rdfs:subClassOf
  • rdfs:subPropertyOf
  • rdfs:domain and rdfs:range
  • owl:inverseOf
  • owl:sameAs

You only need to tell your SPARQL query to do so:

sq.setIncludeInferred(true);

That may actually carry you quite far. And it may also prove perfect if you wanted to map Topic Maps structures onto such a store. I mean, if you ever wanted to use such evil technology.

Still to be Uncovered

While the Java API makes you raise your eye brow every now and then, it seems to do its job to expose some of the AllegroGraph functionality. But there is much more to look at, not only the extensions (geospational and temporal) or the Sesame2 support.

There is also the feature to park Horn-clauses (aka Prolog clauses) onto the server, something which may be far more useful than having OWL, of course.

Nice machinery overall, have to keep digging.

AttachmentSize
wolf.jpg92.4 KB
Posted In

Great little intro to

Great little intro to AllegroGraph, thanks!

Martin BG (not verified) | Mon, 09/07/2009 - 16:32