Wolf in Sheep's Clothings (Part II)
A while back I experimented with the Java client API of AllegroGraph to talk to a triple store.
The latest release (V3.1.1) also sports a Python client which immediately aroused my interest, and that for several reasons:
- It is using a new HTTP protocol with the AllegroGraph server, one using JSON.
- And its API is following that of Sesame.
The following simply goes through the basic motions, as also described by the Python tutorial.
Installation
Setting things up under Debian Linux is trivial as the requirements are already packaged:
For Mac OS X I chose the path via macports.org.
python_select python25
port install py25-curl
port install py25-cjson
Once you have unpacked the agraph distribution, you will find a python directory holding the client code. To make this effective, make sure that your python interpreter picks that up:
For some reason under OS X the Python code is kept under DISTFJE/python/.
Then also the pydoc command works:
Fill in the dots
To start the server the documentation instructs you to use (one line):
--new-http-auth sacklpicker:catbert
--new-http-catalog /tmp/scratch/
That works, but only if you are under Linux. Under OSX the program will loudly complain that it does not understand the --new-http-catalog option. Actually with some guesswork the option there is --new-http-db as an ./AllegroGraphServer --help cryptically insinuates:
:: .....................
The HTTP authentication can also be dropped. I cannot imagine that anyone will seriously consider to expose the server to the open Internet.
Another caveat concerns the use of ~ in the invocation. A
is internally translated into
which caused some head, uhm, scratches.
Open Sesame, Open Sesame
From now on everything is downhill: First you get a server object, just to investigate the catalogs which are available there:
server = AllegroGraphServer("localhost", port=8080)
print server.listCatalogs()
Amazingly enough, this always works, even without any authentication. Only if you want to open one particular catalog you definitely need proper authorization:
server.password = "catbert"
catalog = server.openCatalog('scratch')
print catalog.listRepositories()
As each catalog can hold several repositories (models in RDF-speak), you will have to narrow in on one first. That can be a newly created one or an existing one. In the AllegroGraph tradition, all this is controlled with proper constants
r = Repository(catalog, "catlitter", Repository.RENEW)
r.initialize()
The initialization seems to be important and necessary. It is not packaged into the constructor.
Nota bene: The Repository.RENEW fails under the OS X version with "There is already a store named 'catlitter'". Looks like a bug to me, as it works flawlessly under Linux.
Factory
From the repository one can clone a factory, obviously an artificial construct which allows you to mince RDF objects:
subject = f.createURI("http://cata.log/sacklpicker")
If you want to have more explicit control over the namespace handling, then a slightly cumbersome
sacklpicker = f.createURI(namespace = ns, localname="sacklpicker")
Cat = f.createURI(namespace = ns, localname="Cat")
is the way to go. I just wonder whether the namespace handling could have been moved into the factory object. I mean, if we already have that.
Transacting with the Store
From the repository object you also have to generate a connection object first, at least if you want to modify or query the repository.
The code reveals that there can actually only be one per repository:
if not self.connection:
self.connection = RepositoryConnection(self)
return self.connection
So this is just following Sesame conventions.
Once you got hold of the connection, you can insert triples into the store:
c.add (sacklpicker, RDF.TYPE, Cat)
hates = f.createURI(namespace = ns, localname="hates")
tomcat = f.createURI(namespace = ns, localname="tomcat")
c.add (sacklpicker, hates, tomcat)
c.add (tomcat, RDF.TYPE, Cat)
And you can ask for the current repository (uhm, connection, whatever) size:
And via that connection you can launch your queries:
q = c.prepareTupleQuery(QueryLanguage.SPARQL,
"""
PREFIX c: <http://cata.log/>
SELECT ?cat WHERE {?cat a c:Cat .}
""")
try:
ts = q.evaluate();
for t in ts:
print t.getValue("cat")
finally:
ts.close();
The documentation also advises you to close that connection object at the end. So we will do exactly that:
Bulk Loading
Of course it is also possible to load triples from a file (e.g. in RDF/XML N3 format) and send it to the server:
baseURI = "http://rho.whatever/"
from franz.openrdf.rio.rdfformat import RDFFormat
c.add(path, base=baseURI, format=RDFFormat.NTRIPLES, contexts=None)
print "Triple count: ", c.size()
A protocol trace with wireshark tells me that the way it is implemented is that the file is parsed locally on the client, its content is encoded into JSON and that is sent to the HTTP server. Quite surprisingly, that is not as slow as one would suspect.
Still, you will run into problems once you hit a certain size limit, in my case meager 500000 triples.
Obviously, larger files will have to be chunked by the application for the time being, until this is handled by the AllegroGraph Python client.
Also notable is that the server grows quite drastically in memory size
13889 rho 730m 367m ./AllegroGraphServer ....
One problem I ran into was that from then on the store behaves strangely. As soon as I tried to RENEW it with
I always received an error
large for this store. Store size is 1118020.
The only way to get rid of that was to remove the repository manually from the file system, restart the server and repopulate the content. Not pretty.
Bulk Export
No problems I had with getting the triples out of the store:
c.exportStatements(None, RDF.TYPE, Cat, False, NTriplesWriter(None))
So What Now?
Well, my main motivation to track the progress of AllegroGraph is to find a large-scale backend store for my geosemantic information, that is time series for environment observations and derived values (virtual sensors).
For that I would need encode geospatial information. That is covered for Lisp clients and for the Java client. I still know too little about AllegroGraph to see how this can be done in Python. And then ultimately reproduced in a Perl client.
- rho's blog
- Login to post comments
- Printer-friendly version
