(continued from Part III)
Now that I know where the documents are located in the landscape, I have experimented with ways to estimate where the topic map topics are supposed to be. My hypothesis is that if I can determine the distance of each document to every topic, I can triangulate the topics.
Below (larger version in the attachments) is a new rendering of the MapReduce theme:
It shows the themes derived from the semantic corpus (documents + semantic network). Compare this with the positions of topics:
(continued from HD Semantic Maps)
Like most of you, I collect bookmarks. But unlike most of you, I store them into a semantic network, a topic map to be precise.
One problem I certainly share with you, is that all these laboriously collected links are prone to break. To recover them sometimes needs considerable effort and - according to another Murphy Law (are there actually any other laws?) - always hits you at the most inappropriate time.
(continued from Part III)
Lately I invested more work in the backend server (TM::IP) to also host the document positions: Positions of those documents which - together with the underlying semantic network - form the landscape.
The theme is still MapReduce, but with considerable more content than before.
Seamless document access
On top of Seadragon I then implemented a bit of mouse hover logic to be able to preview HTML and PDF pages directly onto of the map.
But if you break it, you buy it.
The API is not completely stable; first I will have to integrate the piece into my semantic map generation infrastructure.
I also need to better understand how to deal with very sparse maps.
(continued from Part I)
One of the questions you might rightfully ask, is how much impact the semantic network information within the topic map has on producing visualisations like those below:
Or how much they should have, as this is a parameter which I must control.
This is my first stab at a realistic data set (see the attachments for the original resolution):
It shows the landscape around the theme MapReduce, a cloud computing technology about which semantic web people may or may not have heard.
I had mentioned earlier that have now reorganized my new TM server (based on Catalyst/mod_perl/Apache) along the REST paradigm. In my case this means that not only TM data, but also documents attached to it, vector spaces, and so forth are exposed RESTfullish.
At first this appeared to be more RESTfoolish as it was quite difficult to squeeze everything into a GET/PUT/POST corset. And it also was much more work than I had planned to invest, mostly because not only the original resources, but also all machine learning processes have to be exposed, and if it is only their configuration parameters. And they have plenty.
But I seem to have reaped the benefits much earlier than anticipated. Read on.
One of the problems I have to solve for my infrastructure is to compute semantic landscapes ("SemScapes" if you were so marketing-ish inclined) with an efficient computation model.
If, for instance, a user has added a new document to the document corpus, then new feature vectors, after that new vector spaces, new convergence models, new landscapes and new maps (as images) have to be generated.
There is a dependency graph, quite similar to one you are used with tools like make.
One of the many pieces in my puzzle are surfaces of topic maps. When computed these are simply PNG files. There will be different resolutions of these surface (maps), depending on how much content is involved.
As I want to integrate this into my TM::IP landscape, the best is to follow the pattern I used with TM::IP::Documents and have another Catalyst controller doing the work.