Published: 12 Jun 2007
Once you realize that not all data is transactional, you can begin to manage it more intelligently.
Last month I explained my theory of why we're so screwed up infrastructure-wise or at least how we got to this point. This month I'll try to show you the way out of the situation.
For a few minutes, forget everything you know or at least everything you think you know. Accept my argument that almost everything we've done in commercial IT has been based on transactional requirements. Open your mind.
There are two distinct types of data: dynamic and persistent. Dynamic data is in flux; this is where transactional data begins. Persistent data is fixed. It's what it is and will never be anything else.
Just because data is dynamic doesn't mean it starts and dies within an RDBMS. Structured database data starts as dynamic, but at some point it becomes a nonchanging record. It's persistent. You may have reasons to keep it inside a database forever (although I doubt they're valid ones), but those records are still persistent; they are what they are.
Here are a few rules that will help you:
Rule 1: Don't confuse how something begins its life with how it will end. Everything begins dynamic and ends persistent. Stop delineating between structured, unstructured and semistructured. All types live dynamically for some period, whether it's a Word document, a movie, a credit card transaction or an email. It all ends up as fixed digital content.
Rule 2: The attributes and requirements for each type of data are different. Read/write performance, throughput, redundancy, DR, etc., count more in the dynamic phase of data life; however, we've extended all of those philosophies to data that has stopped changing. Building data redundancies and protection schemas to handle real money transactions is good business; backing up a nonchanging data element a thousand times isn't. Having your bulletproof transaction system capable of handling all the dynamic money events thrown at it is good business. But adding processing power, capacity, network infrastructure, etc., to keep it churning away rather than removing the 90% of the data that isn't dynamic and can interfere with the real transactional stuff isn't.
Rule 3: The ratio of true dynamic data (and data being "treated" dynamically) to persistent data is approximately 1:10, and that ratio will rapidly evolve to 1:100 and beyond. Dynamic data just doesn't stay dynamic for very long.
Transactionally oriented systems are all about doing things fast. Perform the transaction fast, store the data fast and load the data into other systems fast. If it sits in a database, it's easy to find, which is the point of a database. The persistent data world is all about finding things. The whole categorizing/ classifying/indexing/search thing is designed to add structure so we can find things. However, it seems to me that if we created two distinct "virtual" places to look for each distinct type of data, it would be a heck of a lot easier to find what we want. If all our dynamic data sat in one place designed to handle things like that and was then moved (based on business rules) into the persistent digital content store, we'd be able to architect this store entirely differently than the dynamic store.
If the dynamic store is about speed and redundancy, the persistent store is about infinite dynamic scale, finding things easily and quickly, and an autonomous self-managing/self-healing infrastructure. It should also be cheap to buy. Stop trying to turn the dynamic store into the persistent one, and also stop trying to make the persistent store dynamic. If you act differently, you'll realize you can get back to making IT a competitive advantage.