High-Definition Semantic Maps (Part II)
(continued from Part I)
One of the questions you might rightfully ask, is how much impact the semantic network information within the topic map has on producing visualisations like those below:
Or how much they should have, as this is a parameter which I must control.
Pure Text-Based Machine Learning
For the sake of the experiment I have turned off the impact the semantic network has on the map rendering process and only use classical text-clustering techniques (see attachments for larger versions):
Like in the rendering above we can recognize easily large regions dominated by the terms MapReduce, Hadoop, Google and cloud computing. This and also their relative placement make sense. Albeit only somehow, and only if you already understand what is going on in terms of MapReduce.
Additionally we also see that several inflationary terms crept in: data, map, reduce, import or job. Not surprising, given that MapReduce is revolving around these concepts. But also more unrelated terms such as comment, blog or views will become more frequent, especially when the content is harvested from the web.
With this democratic way of treating words equally, the map is also taken hostage by the majority terms, resulting in a more confusing, noisy picture. Consequently, these maps have given a rather non-conclusive impression in the past.
This effect we need to overcome.
To counter the word noise I now use the semantic network. Interestingly, alone the existence of the proper topic nodes and their labels provides enough crystalisation points for a distinct aggregation.
To proof this point, I have varied the Topic Maps impact from 0.00 (only text) to 0.50 (very strong influence from the topic map) and have attached the images below. As you can see, aggregation is already strong at very low semantic impact levels. And if you overdo it, you simply enlarge the dominant areas, loosing interesting details.
The Gentle Tap Onto the Forehead
What that means in partice is - I am inclined to believe - that you do not need a sophisticated and deep ontology. Perfect for Topic Maps and SKOS.