Artur Marciniec - Fotolia
A while back, I wrote a piece about archive in place, my term for an alternative data archive strategy that's increasingly necessitated by the Balkanization of centralized storage infrastructure. Simply put, with the advent of software-defined storage, and the continuing appeal in some quarters of big data analytics leveraging Hadoop, MapReduce and related technologies, it wasn't too far a leap to suggest that some sort of distributed archival strategy -- one that didn't require migrating data to or from a central repository -- might be in order.
Frankly, for what I regarded as an obvious statement of fact, I was surprised by all the positive responses the piece accrued in social media and the blogosphere. Even IT Weekly Newsletter notified me that it had decided to include the piece in its highly selective round-up of technical trade press articles.
Immediately, I had new friends. Caringo, the object storage software company whose bag of tricks include something called Darkive that is very similar in concept to what I was describing, liked the column a lot. In fact, my archive in place matched their thinking about the future of storage and archive rather neatly.
A number of hardware vendors (read: disk array vendors) also contacted me regarding archive in place, some noting they hadn't been in touch with me before because of my seemingly iconoclastic position that tape was the ideal archival medium. (I still believe that, by the way.) They started explaining how low-power modes on disk drives could close the gap between power consumption metrics of disk archives and those of tape, as though that was the only important variable between the archival platforms. Some, like Dell, suggested their "appliance" model -- a Dell server plus Dell storage running software designed to make the kit function like an archive -- might make them an ideal building block for distributed archive.
SIOS Technology Corp., formerly SteelEye Technology, contacted me to arrange a dog-and-pony show featuring its clustering software technology. The product is designed to augment the clustering capabilities of Windows Server and other popular hypervisors and operating systems, or alternatively, to facilitate the efficient and reliable replication of data between clustered nodes both as a safeguard against data loss and as an enabler of easy active-passive nodal failover. Granted, they were less interested in highfalutin ideas like archive in place than they were in helping companies to move their Tier-1 applications -- those with no tolerance for downtime, whose response time is critical, and that require truly agile resource allocation, efficiency and reliability -- into the "cloud."
To be honest, it felt rather odd that I was getting invites to soirees from which I had previously been excluded because of my tendency to point out the economic and technological gotchas in software-defined, cloud-ified, virtualized hardware strategies. I tried to resist my natural inclination to interrupt the presentations of the various technologies being described to me. I tried not to object to, or seek clarification about, the many contradictions between the facts the vendors were presenting and the real-world capabilities and limitations of the technologies on which their wares are based. I failed.
I found myself wondering aloud why no one (except Caringo and maybe Dell) was talking about data management or infrastructure monitoring and management, without which IT efficiency improvements couldn't be accomplished -- with or without clouds or clustering or hypervisors. If you're delivering a more efficient clustering technology, as in the case of SIOS, why would you cede the entire responsibility for monitoring and managing the hardware and plumbing between clustered nodes to the weak tools of a server operating system? Answer: Because consumers don't understand what's involved in infrastructure management and don't want to learn. Management has never been among the top 10 criteria for server or storage product selection.
But what about data management? Managing how data is stored goes a long way toward containing storage costs, I've been told. Why is there so little attention being paid to the impact of flattening storage through local, direct attachment to virtual machine-hosting server nodes on the overall cost and efficiency of storage? Without some sort of smart storage tiering that places data on various media with different cost of ownership based on data re-reference rates and other criteria, all storage tends to be Tier 1 -- the most expensive kind of storage -- and that can break your budget, as smart guys like Fred Moore have argued over the years. Instead, all of this clusters-with-direct-attached-storage architecture is eliminating tiering and necessitating such contrivances as archive in place to compensate for the lack of centralized storage pooling.
Taken together, the promulgation of mostly unmanaged data on mostly unmanaged hardware is creating a huge data disaster risk that the industry seems to want to deal with mostly by simply replicating all data all over hell and half of Georgia. That will certainly bolster the bottom lines of certain software-defined storage software, server clustering software and disk/flash hardware vendors, but I'm not sure it does much for users. Caringo is doing a yeoman's job of trying to salvage this situation with a "manage in place" object storage framework, but even its wares would perform better if the hardware topology beneath them, the physical infrastructure, made more sense.
Archive in place is an interesting idea and one we need to develop further for as long as users follow the lead of their vendors and opt for nodal clustering and distributed storage architecture over shared common storage architectures with effective data and infrastructure management. It seems to me that vendors want even the smallest firms to pretend they have the unlimited budgets required to replicate everything n-ways, so that the failure of a specific hardware/software stack -- whether from some disaster or poor maintenance and management -- won't be felt by business operations. Just take the bad gear off the line, buy new gear, redeploy, synchronize and go.
That sounds great in theory, but I don't want to be the one to argue what that will do to next year's budget request. If you think it costs a lot to maintain and fix whatever breaks in legacy infrastructure over its useful five- to seven-year life, try figuring out what it will cost to replace server/storage stacks every couple of weeks or months. Clusters, I have found, often lose their lustre when examined through a budgetary lens.
About the author:
Jon William Toigo is a 30-year IT veteran, CEO and managing principal of Toigo Partners International, and chairman of the Data Management Institute.