Tutorial: TM Semantic Visualisation (Part I)
Before we can look at the impact the topology of a topic map makes on the visualisation, we first consider only the text content itself. And to have a better control over it we will stick to purely synthetic maps, i.e. those where the text follows certain patterns we control.
Our first map only consist of two topics:
For each of the topics we will add the same amount of documents, each of exactly the same size and each consisting of random words. Lazily, I attach these documents as text occurrences to each topic.
For the topic AAA we will cleverly only use words AAA to MMM and for the other the words NNN to ZZZ. Additionally we do it in such a way, that every document in AAA has a corresponding counterpart in NNN so that a document BBB DDD MMM .... will match a OOO QQQ ZZZ .... The document structure of the two topics is then actually identical (attachment bipolar-ident.atm).
When we look at the visualisation above (see the attachment for a larger version), then we would expect that the areas of both topics are pretty equal. And this is indeed the case: Both mountain ranges are roughly of the same lateral extension, and as there are no term commonalities, there are deep gaps separating them.
As there is no connection at all, there is actually no constraint how the two mountains are to be positioned relative to each other. They are like tectonic plates: it is quite random where the end up if you recompute the landscape.
Everything in green is an area of low term intensity, i.e. there is no term in the document corpus which has any significant intensity. Shading into brown, the intensity increases. Accordingly JJJ in the left mountain has there one particular spot where it has the highest intensity.
Note that this does not necessarily mean that JJJ is also overall the most frequent term. Here it actually is not (LLL is). But it is the most frequent in a certain context, the context provided by the rest of the mountain.
Document-wise this means that JJJ is the term which appears somewhat more frequently in a certain similar number of documents. As similar documents support elevations, they will push JJJ onto the peak.
So this is not just a term counting exercise. And it may reveal surprising results.
Actually every term has a a certain intensity on every spot. Although only at a few spots the term will be pushed right onto the top, and will so become visible. All other will stay under the radar.
If such a top term is on several connected spots (building a region), only then I will display the term in the map. The larger that connected region, the larger the font will be, so that the font size is an indication of the regional importance of a term.
That may or may not coincide with a local importance. In this sense I am conflating two aspects of term relevance: the local intensity and the regional extension. Maybe we need a better approach here.