- Steve Ricketts, Taneja Group
Years of data backups have left many organizations with multiple copies of data that's hard to track and manage....
Data access has also become a challenge. When application owners want copies of production data, they must often submit a ticket to IT and wait days, or even weeks, for a response.
These data management practices aren't optimal and have led to higher-than-necessary storage costs, data compliance issues and constraints on agility and productivity, among other problems. In addition, digital transformation is driving massive data growth, and malicious activity is widespread. Given all this, it's easy to see why most organizations are placing a high priority on modernizing data protection and secondary data environments.
Copy data management is an exciting answer to these problems. It focuses on both protecting production data and improving the management of production data copies. The goals are to cut storage costs, improve data visibility and compliance, and speed data access.
Changing copy data management market
Until last year, the copy data management market was mostly CDM point products from companies such as Actifio and Catalogic Software. But the CDM market is rapidly changing as some of the largest storage and data protection vendors jump in.
EMC (now Dell EMC) and IBM introduced CDM products last year, and early this year, Veritas came out with Veritas Velocity CDM. It's not surprising that vendors are interested in copy data management. It's a natural extension of their storage and data protection products, and many organizations have moved beyond tire kicking to implementing CDM. A 2017 Taneja Group study found that more than 30% of companies are evaluating CDM products or have implemented them.
So what CDM capabilities do companies value the most? And how are vendors responding to user demand for CDM functionality? The answers to these questions provide an understanding of where the copy data management market is having the greatest impact. They also identify which vendors are leading the effort to provide universal data visibility, instant access to data, automated data protection, and data portability that capitalizes on hybrid and multicloud environments.
IT pros responding to the Taneja Group survey listed lowering storage costs; better data visibility, insight and compliance; and the ability to consolidate secondary storage as the top three CDM capabilities beyond data protection. Better data lifecycle management and automated copy management with DevOps workflows also ranked in the top five CDM capabilities.
Customers often point to the importance of data reduction for lowering storage costs. Compression and deduplication have been a major part of data protection products for a long time. Copy data management vendors, such as Cohesity, are maximizing data reduction with global, variable-length deduplication that works across a customer's entire storage footprint. Actifio also offers global deduplication, and Dell EMC Enterprise Copy Data Management (eCDM) and Veritas Velocity provide global deduplication when implemented as an extension of Dell EMC Data Domain and Veritas NetBackup, respectively.
Another aspect of lowering storage costs is the ease of maintaining a storage system. Cohesity's DataPlatform sets the bar high here, with the inherent simplicity in its hyper-converged storage architecture. It simplifies system upgrades and recovery from node failures because there's no disruption, no manual configuration and no need for data migration.
The flexibility to use cost-effective storage is also important. In-place CDM products that aren't tied to purpose-built appliances have an advantage here. For instance storage companies, such as Dell EMC, Hitachi and IBM, prioritize CDM support for specific storage devices. Independent software providers, such as Catalogic and Commvault, deliver in-place support for a variety of storage devices, using the native snapshot and replication capabilities of the storage arrays for which they provide CDM support. CDM products with a scale-out, nodal architecture, such as Cohesity DataPlatform, use nodes that run on commodity hardware. Copy data management vendors that support disaster recovery as a service in the cloud can further reduce storage costs by eliminating the need to maintain multiple physical data centers.
Data visibility improved
Reducing the number of data copies is another way to cut storage costs. This leads to another top capability highlighted by survey respondents: having better data visibility, insight and compliance. All CDM vendors offer a comprehensive metadata catalog that provides insight into physical and virtual infrastructure resources, letting administrators quickly determine where copy data lives and when data is accessed. Copy data management vendors also provide analytics with predefined reports, dashboards with filters and the ability to create custom reports for specific needs.
Search is another essential capability. Search filters help administrators quickly find objects that match certain criteria, such as virtual machines (VMs), volumes in a certain location or data copies bigger than a certain size. Vendors in the copy data management market that stand out here are those that deliver deep file search functionality, which is required for security compliance.
CDM and data protection: A replacement or a complement?
Three-quarters of respondents to a recent Taneja Group survey said copy data management, or CDM, was either complementary to data protection or a whole new category of technology that will replace data protection.
Interviews revealed that while IT professionals don't necessarily agree on whether CDM is a data protection complement or replacement, they do agree that its functionality should be seamlessly integrated with data protection. This must be done in a way that provides a unified experience whether the goal is to monitor, protect, manage or optimize the secondary data environment.
A core tenant of CDM is giving application owners fast self-service access to data copies. This is another top capability from the survey: "Enable automated copy creation and management for DevOps workflows." Developers, QA staff and database administrators have pushed for this capability, demanding timely data access and the ability to get copies without IT involvement. Self-service has become a popular CDM use case, supported by all major vendors in the copy data management market. It's the on-ramp to CDM for many companies. Self-service for DevOps or test-dev data management is increasingly the first CDM use case customers deploy, according to Actifio. Dell EMC reported that application admins now do 20% to 25% of their customers' backup jobs rather than centralized IT.
Providing application owners and admins instant access through self-service includes finding, provisioning and managing resources using a service fabric, portal or marketplace; role-based access controls; data virtualization; integrated lifecycle management; data masking, or obfuscation; and integration with developer tools, database tools and management platforms. Application and database support can be a differentiator, and vendors are continually expanding the list of virtual and physical applications and databases they support. IBM recently added support for SAP HANA, EPIC Cache databases and Microsoft SQL Server on physical servers.
Data lifecycle management
CDM also enables data lifecycle management through policy-based orchestration. True automation requires spinning up and down an entire infrastructure, which means creating policies that provision data copies; setting network parameters, refresh frequencies and retention periods; and cleaning up copies and VMs as needed.
All major vendors in the copy data management market offer policy-based orchestration, but managing service-level agreement (SLA) compliance and cloud support can set a vendor apart. For example, Dell EMC's eCDM offers comprehensive functionality when it comes to full-lifecycle SLA compliance that monitors SLA quality of service. Also, having a visual workflow builder is beneficial for ease of use. Hitachi Data Instance Director is strong in this area.
Cloud support has become an important aspect of orchestration and data lifecycle management. Look for vendors that offer policy-based cloud tiering for data archival and support for on-demand cloud workloads, as well as disaster recovery in the cloud. Also, look for vendors that support major public clouds -- Amazon Web Services, Google Cloud Platform and Microsoft Azure -- and offer native cloud support rather than relying on third-party cloud gateways. Actifio, Cohesity and Dell EMC offer extensive cloud support.
Consolidating secondary storage workloads
Another high-ranking capability was the ability to consolidate secondary storage workloads or use cases, such as backups, data archival, test-dev and analytics workloads, and shared file services. Flexibility and scalability are the two factors that play an important role in workload consolidation. They're enabled by support for multiple storage devices; multiple protocols, such as iSCSI, NFS, SMB and Amazon Simple Storage Service; and multiple hypervisors or virtual environments, including Microsoft Hyper-V and VMware vSphere. Most vendors start with VMware vSphere support and then add support for Hyper-V and other hypervisors, such as Oracle Virtual Box, but not all vendors in the copy data management market support all environments.
The depth of support for virtual environments should be examined. For VMware environments, companies evaluating CDM products should ask vendors about support for the VMware vStorage APIs for Data Protection, vSAN and the vRealize Suite. Scalability is another important factor for workload consolidation. Vendors commonly support a scale-up architecture, or the ability to expand capacity by adding disks. Some offer a scale-out architecture that lets companies expand horizontally by adding more nodes. Scale-out architecture is less familiar to most companies, but worth considering. For example, Cohesity offers a scale-out architecture that enables throughput and IOPS to increase linearly with cluster size. This helps companies consolidate workloads and scale as storage demands grow, without compromising performance.
It's hard to say if or when copy data management will become synonymous with data protection in terms of importance and adoption. But it's fair to say that CDM is going mainstream. Many companies are considering, evaluating and implementing copy data management as they look beyond centralized data protection and toward modernizing secondary data environments. In doing this, they're providing on-demand, self-service access to copies of production data and secondary storage for file services, as well as using indexing, advanced search, and analytics to find out-of-place confidential data and ensure data compliance.
Take our quiz and get to know copy data management technology
Discover how CDM can reduce storage costs
CDM performance issues to be aware of when talking to vendors
- Focus on Storage in a VMware Environment –SearchConvergedInfrastucture
- Storage in a Virtual Environment: Expert Answers to 4 FAQ –SearchStorage.com
- Six Data Storage Management Challenges, and How to Solve Them –Hitachi Vantara
- How to solve the top 6 storage management difficulties –Hitachi Vantara