Content Landscape (Part I)

(This is a tangential, marginally interesting write-up of a more holistic view of content, i.e. the material most of us are dealing with here. The Topic Map aspect will hopefully become clearer in a later instalment.)

Every modern enterprise maintains its core data in some database, most commonly a relational one. In the world of database designers, all relevant enterprise information should be maintained in such structured form. This is then also the main objective of conventional Enterprise Information Integration (EII), and if this is achieved, then Enterprise Application Integration (EAI) can follow suit.

The reality is, though, that much of the enterprise information is document-centric, and not data centric as assumed in conventional RDBMSes. Therefore document management systems and content management systems have been the preferred solutions in that segment. These systems may or may not use a relational store at their base; what is important, is the structure of the data and how it can be accessed. How it is effectively stored is secondary.

This dichtonomy between document- and data-centricity has been partly addressed with XML. With it, tree-oriented structures can be described. Documents could always be naturally mapped into tree form, with chapters, sections, etc. being nodes and the text being in the leaves of this tree. And tree structures can also capture table-oriented information, such as that inside relational databases.

This has put the spotlight on transformation/query languages such as XQuery and XSLT which then allow to access a wide range of information and quite flexibly transform it into other trees, tables and flat text.

So far, so good.

So Where's the Problem?

Ignoring XML's idiosyncrasies, one main problem with tree-oriented structures is that they are exactly this: tree structured. As such they can perfectly host narrative information (such as in documents) and iterative content (such as in tables), but not arbitrarily structured information. This becomes problematic when (a) within one XML document one piece of information has to reference another, or (b) one XML document has to reference information within another XML document.

While this indicates that the content naturally has a graph structure, XML authors have to emulate this by using XML IDs or even resort to XLink to implement links.

This is, for instance, a problem with meta data, i.e. information about the document. It sometimes has to live inside the document it is about, sometimes not.

But even worse, nothing within the XML framework itself caters for the addressing of information, or more generally, the addressing of the things the information is about. This has been the realm of knowledge-oriented systems all along. Many of them have identity built in, one way or the other.

Accordingly, a more generalized picture presents itself like this:

The first row lists the main content paradigms, i.e. that for relational data model, object-oriented information, then trees as they are used in file systems, or XML. With increasing flexibility we then move on to graphs and on the very right we see fulltext.

These content paradigms are the logical view of the content; how it is stored, so what storage paradigms are used, is an implementation matter. More or that in the next instalment.

content-theory-small.jpg46.05 KB
Posted In