kovaleff - Fotolia
One of the issues I have when vendors pitch cloud archive or cloud backup or cloud storage is that they put the cart before the horse. Before any organization can start replicating data to a cloud -- whether for preservation, protection, security or compliance, or just to free up capacity -- they need to have some sense of what data they are copying. There is no one-size-fits-most strategy when it comes to enterprise data management.
While I/O is I/O, bits are bits and blocks are blocks, not all data has the same value to the business. Data inherits its criticality, for example, like so much DNA, from the applications that produce and use it. Applications are only as important as the role they play in a mission-critical business process. Absent any cross-reference to business process/application criticality, data is just a bunch of anonymous magnetic or optical signals.
Failure to understand the differences in your data will lead to mistakes in developing plans for disaster recovery, security or compliance. Only a small amount of the data you store is frequently referenced, changed or modified. Those are the bits that are properly called "production data." In studies conducted by the Data Management Institute, production data totals about 30% of the data you store. The other 70% is a combination of orphan data, contraband data, data copies -- as well as data that must be retained for business or regulatory reasons. Where you store the data is also critical to how you plan for data protection, security and compliance. Given the current hardware and/or hypervisor stack-centrism of most data storage today, you must know where the data that serves a critical app is located and then find efficient ways to apply the right kinds of data hosting, protection services and archival policies to the data.
Enterprise data management processes like data protection (data copy) are neither hardware- nor hypervisor-agnostic for the most part. IBM has its preferred ways to make copies of data stored on its kit, EMC has its preferences, VMware doesn't want its VSAN data to be replicated to a Microsoft Clustered Storage Spaces repository and NetApp doesn't like having its filers replicate to Isilon rigs. In many cases, vendors have erected actual barriers to prevent replicating data outside their stack.
Whether your infrastructure or storage platforms are proprietary or open, your data protection (copy) strategy still should account for data criticality, access frequency, change rates, as well as expedited restore, failover and failback. This enterprise data management analysis is hard to do manually -- and therefore often doesn't get done.
Even before you create data protection or retention strategies, consider data class and usage characteristics when building the storage infrastructure itself. Hosting all data on Tier 0 or Tier 1 storage, which IBM ironically notes is done much too often, is costly and inefficient -- just like replicating, encrypting, and archiving everything forever. This will become abundantly clear when companies begin to approach zettabyte-sized data repositories. Yet, chances are good that little attention has been paid to application criticality or data availability requirements in selecting platforms for hosting data in the first place.
So we are often confronting a cluttered storage junk drawer that makes it decidedly simpler to forget about slicing and dicing data into classes or categories and just finding a way to copy the whole mess to a cloud. That's what a lot of cloud service vendors want us to do -- even though it is poor strategy.
That's where enterprise data management technologies like those from Catalogic Software become important. Catalogic's ECX enables companies to collect data about all their data, then to apply protection (copy) services judiciously and without requiring data to be moved from where it is located. The methodology for using the product is embedded in its user interface.
To start, you catalog the data wherever it is -- gathering metadata on data behavior such as usage, retention and growth rates. Tools for data discovery work pretty well, but they are limited to certain equipment and hypervisor brands, with the range of platforms that can be accessed and cataloged increasing based on vendor alliances and customer feedback, according to Catalogic.
Catalogic helps you document the topology of the storage infrastructure -- it creates diagrams showing how the data is hosted and how it is being copied or replicated, including where those copies are stored, how often they are updated, and for how long the replicas are retained. This is valuable information, especially to the cadre in the data center that seek to maintain stability and resilience in "legacy" operations.
ECX provides a means to use select data in new applications or databases, or in analytics processes, and so on. It's a real timesaver for developers to be able to cross reference production data or a copy that already exists, instead of copying the data yet again into a separate instance.
The product provides tools to analyze data, assess the efficacy of existing replication processes and spot topology limitations, boosting your enterprise data management capabilities. It also can help improve data hosting, eliminate chokepoints, reduce wasteful copying and copy retention, and fine-tune the protection services applied to the data.
Catalogic technology is getting us closer to simplifying the essential, yet undervalued, planning and management tasks that should be part of storage infrastructure design, data protection and data preservation. The company demonstrated the latest version of their product to me in early February, about the same time as they announced joint offerings with IBM. It is clear that this technology is "enterprise class."
Build an effective enterprise data management process
Craft a strong enterprise data governance program
Why CIOs need data management skills