The storage industry is currently abuzz with a new mission-critical data storage dilemma: Something I like to call, "really long-term data storage" or RLTDS. With new, post-Enron, Security and Exchange Commission (SEC) regulations on the horizon that mandate the secure, long-term storage of all financial trade-related data, and the just-now-kicking-in data storage and protection provisions of the Health Insurance Portability and Accountability Act of 1996 (HIPPA), many organizations are confronting the need to find a way to store a lot of data reliably for a decade or more.
Industry analyst white paper mills are already killing rain forests to churn out report after report explaining why RLTDS is the next "killer app" in storage and to suggest the best technology for addressing this dilemma. It's the biggest thing since the "data explosion," that marvelous mythology about annual data doubling promulgated by the analysts a couple of years ago to support their arguments about the mission criticality of SANs.
With respect to RLTDS, some analysts have adopted EMC's line about a "new category of data" requiring, of course, "a new category of storage platforms" (e.g., the vendor's Centera platform). EMC's reasoning has its strengths, of course. Data with long-term storage requirements combined with more frequent access requirements than archival stores may well require the access speeds of disk rather than tape, necessitating the use of disk-based platforms. Moreover, as the folks at Hopkinton suggest, such data storage requires special encoding to ensure that the data doesn't change as it is migrated from platform to platform over a period of many years.
Just how to deliver these capabilities, however, is the big question. EMC's Centera may be among the first products in the market aimed specifically at the problem, but will it be the best product over the long haul?
EMC Centera offers data encoding using a proprietary scheme the vendor acquired when it bought Belgium-based Filepool NV in April 2001. Filepool provided a data naming scheme that identifies data in an organized way and a non-repudiation scheme to safeguard against data corruption over time. In effect, the technology helps to facilitate the transition of data from one Filepool-enabled platform to another over the years in a more or less traditional scheme of hierarchical storage management, while ensuring that shadows will not suddenly appear on stored x-ray images of the lungs of patients as a result of nasty compression algorithms or random disk bit flipping on the storage devices themselves. EMC weds Filepool software to some specialized controllers, which are mated to otherwise inexpensive IDE/ATA disk arrays.
One way to think of the solution is as a primordial effort to get to where we all need to go eventually: To an object-oriented data naming scheme that will enable the life-cycle management of data itself. This would make our current clumsy methods of storage management a thing of the past.
Another way to think of Centera is as a key to the realization of EMC's objective to be "where information lives." Forever. Ad eternum. The smart guys in Hopkinton have long realized that if you can "own" a company's data, their hearts and minds will follow.
Locking in to a vendor's proprietary data naming scheme invites a long-term relationship with that vendor. Simply put, the user will not be able to migrate data over time from one platform to the next unless the next generation platform supports the same data naming scheme as its predecessor. For EMC, Filepool/Centera will produce wonderful annuities.
What is needed is an open, object-oriented, data naming taxonomy that can be supported by all vendors. Such a taxonomy will ensure that anybody's gear can be used to host data over the long haul. The problem is, of course, that nobody in the industry wants an open standard because it would further commoditize storage.
There has been some talk lately of doing data naming at the point of data creation using XML. The problem is that this is not what XML was designed for. Current XML wrappers would add about 30 percent more volume to data products produced in day-to-day IT operations, and that would translate to greater bandwidth requirements for data movement and an even bigger drain on the overall storage infrastructure capacity deployed by companies.
For the record, this problem was touched on a few years ago in a paper presented at an IBM GUIDE conference. (I have secured a tattered, dog-eared, hardcopy of the original paper from Fred Moore at Horison Information Strategies, which I maintain in paper form as a safeguard against long-term electronic corruption.) While the paper confined itself to data naming in mainframe environments, I am -– with the able assistance of an underground group of storage-industry-free thinkers –- working to develop a criteria-based data naming taxonomy for open systems that will be published in my next book, The Holy Grail of Network Storage Management, later this year. Perhaps such a scheme used in conjunction with an open standard data format system, like the Universal Data Format, would help get us to where we need to go.
I will continue to flesh out this concept over the next few columns and invite readers to contribute their common sense from the trenches, views to help make the taxonomy a real solution for the rest of us.
Alternatively, we can all send electronic messages about our RLTDS solution to the KEO project, which is hard at work preparing an Earth-orbiting satellite to serve as a time capsule as part of what UNESCO calls a key project for the 21st Century. Messages stored on the satellite will be able to be downloaded in about 500 centuries, so you can tell your great great great great great grandchildren about the proprietary system you purchased in 2002. Who knows, they may still be using it by then.
About the author: Jon William Toigo has authored hundreds of articles on storage and technology and authors the monthly SearchStorage "Toigo's Take on Storage" expert column. He is also a frequent site contributor on the subjects of storage management, disaster recovery and enterprise storage. Toigo has authored a number of storage books, including The holy grail of data storage management.