Content Landscape (Part II)

In the first installment of this micro series I tried to make the point that different content paradigms are here for a reason and that the very fact of life should be accepted, not ignored.

Each paradigm organizes content in its own way, either using a set of tables, a hierarchy of object classes, a tree-like structure, graphs (semantic networks), or just as natural language text.

What is also worth noting is that the different paradigms can deal with different levels of irregularity. That is probably exactly the reason why and how they historically evolved. Tables adhere to a quite regular form, whereas arbitrary natural language text can have a fair bit of irregularity to it. Anything right to Text in my figure

http://kill.devc.at/system/files/content-theory-small.jpg

would already be noise.

All this is so far on the level on which particular content can be modelled; it is not necessarily the way content eventually is made persistent.

Storage Paradigms

For each of the above content paradigms dedicated technologies have been created over the last 30+ years. RDBMSes are perfectly suited for relational data, OODBMses are tailored for objects, many specifically to those used in OO programming languages. Nowadays we also see dedicated XML stores on the market, so that we can expect to see more technologies specifically designed to hold graphs (often referred to as Tuple Stores).

The middle row in the Figure lists fulltext engines as well: They are specialized to hold fulltext documents, possibly enhanced by metadata to tag and classify documents.

Each of these technologies is created with a particular paradigm in mind and has the advantage that any inherent structure within the paradigm can be hardcoded in order to achieve the best performance. A relational database system will have table as a first-class concept as ontological commitment. Consequently, the software can rely on the fact that every row in the table has exactly the same structure. Other structural commitments have to be made for other types of stores.

Crossovers

The fact that different technological commitments have been done for each paradigm does not rule out the option to host certain information using a storage technique which has not been aligned to the same paradigm.

For many years programmers use relational databases to park their objects, despite the well-known impedance mismatch. And while object-oriented databases never made it into the mainstream, XML stores are more and more used to store object instance data, especially in the advent of SOA.

Anf there have been massive investments in persisting XML structures within relational databases; the same holds true for storing graph information (specifically for RDF and Topic Maps), and for fulltext content.

Of course in any of those configurations where there is no perfect fit an impedance price is to pay: Be it the added complexity of the mapping or be it the lost optimization opportunity. Today it is mostly a tactical decision within an organization which storage technology is preferred. The impedance mismatch is then simply traded off against the price to maintain different databases, technology-wise or licence-wise.

But that cannot be the end of the line. We have to do better than that. This is what the third part will be about.

Posted In