Tutorial: TM Semantic Visualisation (Part IV)
(continued from Part III)
Last time I left off with showing you how individual documents would be blended into a landscape which is computed from a topic map.
So far I have ignored the topological structure of the topic map itself and computed the landscape only from the terms within the documents. But my ultimate goal is/was to visualize whole topic maps, not just the text corpus.
Topic Maps Impact
For that I compute a measure how strongly a particular document (inline data or content within a URL-referenced document) is attached to a topic. Via occurrences and associations not only the closest topic, but also all neighboring topics have so an impact. And all that is weighted by the entropical value an association has within one map.
This measure then I use to strengthen the topic names in the convergence process.
To see how this works let us return to the bipolar topic map, which only contained the topics AAA and NNN. Both have documents attached, but they are completely disjoint (although identical in internal topology).
When only looking at these documents, then the highest intensities are at WWW and JJJ, respectively. As there is no overlap in terminology, the mountain ranges are disjoint. But if you look closer you see strong similarities between the two: YYY corresponds to LLL, OOO to BBB, XXX to KKK, etc.
Now let's turn on the topic map impact:
Having pushed up the topic names, they now dominate large areas. Only the originally strong JJJ/WWW or XXX/KKK hot spots survived the change. Underneath the surface the internal topology remained mostly the same, although it sharpened a bit, aggregating the documents into a somewhat closer area.
The second topic map I experiment with is a variation of the first, namely with some common documents shared between the two topics. At some stage a document-only rendering looks like this:
You will notice some small elevations around RRR and AAA, both of which are created by shared documents being quite intense there. The rest is the usual two-continent landscape.
Now let us turn on the topic map impact:
Again the overall structure remains the same, and again you can see how the topics not only dominate the picture, but also that they gently aggregate the documents, deepening the divides.
One can even increase this effect by turning on topic mappishness from the very beginning:
The reason for this to work is that the network is more amenable to learning when it is young and less biased (go figure).
And So What?
For what it's worth, I have proven to myself that one can visualize a topic map, respecting both, the topological aspect in the map, and the text information from the involved documents.
Of course there is much to say about the practical value of this.