High-Definition Semantic Maps (Part I)
This is my first stab at a realistic data set (see the attachments for the original resolution):
It shows the landscape around the theme MapReduce, a cloud computing technology about which semantic web people may or may not have heard. In either case, the landscape tries to paint an intuitive picture of the involved topics:
- MapReduce, the computing principle;
- Hadoop, the Java implementation of MapReduce, and also satellite technologies such as hadoopdb, pig, dumbo or Mahout;
- Cloud computing as offered by Amazon (EC2), Yahoo or Caldera and the likes;
- How Google (which promoted the paradigm first) fits into the picture (sic!);
- And the many other software packages which implement the MapReduce processing method.
If your favourite software package is missing, then either (a) I have simply missed it recording in my semantic network, or (b) there was not enough text information for the machinery to push the topic onto the surface. Or (c) it could also be that the map resolution chosen for this demo does not allow that degree of detail.
What This Map Shows
Like for natural landscapes, the areal extension of a certain topic is a direct measure how relevant that topic is in relation to the whole.
The mapreduce range in the south-east (SE) corner occupies around 1/8th of the overall real-estate implying that the topic has about that importance within the considered semantic network. Actually, this mountain range is only covering predominantly "MapReduce, the computation paradigm". "MapReduce as used and seen by Google" is a different aspect which is located at the north edge. Also the aspect "MapReduce software" is separate (around the north-east).
The color coding should symbolize topic intensity. Revisiting the south-east (SE) corner mapreduce range again, then obviously there is much content covering predominantly that very topic (and nothing else). Content here meaning:
information within the semantic network and the documents that it mentions.
The alert reader will notice that the landscape will wrap around the edges: Follow it south, and it will continue in the north. Follow it east, and in the west it will connect (torus topology).
A Virtual Semantic Tour
Let's look closer at the map:
The SE montain range labelled mapreduce is covering mostly conceptual material, many blog entries introducing into the basic idea behind the paradigm and cloud computing per se. Towards the south the relationship between CC and conventional databases becomes more emphasized.
Following the ledge northwards will lead to the Hadoop shoulder. Documents there will be Hadoop's own documentation, tutorials and experiences. Further northwards these experiences will be more about operational issues and cloud computing in general, be it with Amazon's Elastic Cloud or using Cloudera (not much about Yahoo here, yet). A few documents also cover performance issues and are therefore visible there.
The massive mapreduce block in the north-west is Google's position. Unsurprisingly, one will find the discussion regarded related patents there. Separate from the Google influence sphere there is Microsoft's Dryad within the plain. Obviously Dryad's importance pales compared to the rest, both, in terms of real-estate and document intensity.
Around the equator you will find many more hills concerning with rather small software packages, some satellites to Hadoop (such as HadoopDB, HDFS or Hive), some more separate such as Pig or Gisting. I have not much information collected about most of these. Hence their low intensities in the map.
And The Documents?
What is possible, is to position documents where they would fit best into the landscape (again, better look at the larger versions below):
Each little circle represents one document whereby the circle size corresponds to the impact that document has on the landscape.
Displaying document details is yet unfinished business.