Cybrain - Fotolia
Digital transformation is the key IT trend driving enterprise data center modernization. Businesses today rapidly deploy web-scale applications, file sharing services, online content repositories, sensors for internet of things implementations and big data analytics. While these digital advancements facilitate new insights, streamline processes and enable better collaboration, they also increase unstructured data at an alarming rate.
Managing unstructured data and its massive growth can quickly strain legacy file storage systems that are poorly suited for managing vast amounts of this data. Taneja Group recently investigated the most common of these file storage limitations in a recent survey. The study found the top challenges IT faces with traditional file storage are lack of flexibility, poor storage utilization, inability to scale to petabyte levels and failure to support distributed data. These obstacles often lead to high storage costs, complex storage management and limited flexibility in unstructured data storage.
So how are companies addressing the unstructured data management challenge? As with all things IT, it's essential to have the right architecture. For unstructured data storage, this means a highly scalable, resilient, flexible, economical and accessible secondary storage environment.
Let's take a closer look at modern unstructured data storage requirements and examine why distributed file systems and a scale-out object storage design, or scale-out storage, are becoming a key part of modern secondary storage management.
Scalability and resiliency
Given the huge amounts of unstructured data, scalability is undeniably the most critical aspect of modern secondary storage. This is where scale-out storage shines. It's ideal for managing huge amounts of unstructured data because it easily scales to hundreds of petabytes simply by adding storage nodes. This inherent advantage over scale-up file storage appliances that become bottlenecked by single or dual controllers has prompted several data protection vendors to offer scale-out secondary storage platforms. Notable vendors with scale-out secondary storage offerings are Cohesity, Rubrik and -- most recently -- Commvault.
Attaining storage resiliency is another important requirement of modern secondary storage. Two key factors are required to achieve storage resiliency. The first is high fault tolerance. Scale-out storage is ideal in this area because it uses space-efficient erasure coding and flexible replication policies to tolerate site, multiple node and disk failures.
Rapid data recovery is the second key factor for storage resiliency. For near-instantaneous recovery times, IT managers should look for secondary storage products that provision clones from backup snapshots to recover applications in minutes or even seconds. Secondary storage products should allow administrators to run recovered applications directly on secondary storage until data is copied back to primary storage and be able to orchestrate the recovery of multi-tier applications.
Flexibility and cost
To handle multiple, unstructured data storage use cases, modern secondary storage must also be flexible. Central to flexibility is multiprotocol support. Scale-out storage should support both file and object protocols, such as NFS for Linux, SMB or CIFS for Windows and Amazon Simple Storage Service for web-scale applications. True system flexibility also requires modularity, or composable architecture, which enables multidimensional scalability and I/O flexibility. Admins must be able to quickly vary computing, network and storage resources to accommodate IOPS-, throughput- and capacity-intensive workloads.
Good economics is another requirement for modern secondary storage. Scale-out storage reduces hardware costs by enabling software-defined storage that uses standard, off-the-shelf servers. It's also simple to maintain. Administrators can easily upgrade or replace computing nodes without having to migrate data among systems, reducing administration time and operating costs. Scale-out secondary storage also provides the option to store data in cost-effective public cloud services, such as Amazon Web Services, Google Cloud and Microsoft Azure.
Moreover, scale-out storage reduces administration time by eliminating storage silos and the rigid, hierarchical structure used in file storage appliances. It instead places all data in a flat address space or single storage pool. Scale-out secondary storage also provides built-in metadata file search capabilities that help users quickly locate the data they need.
Some vendors, such as Cohesity, offer full-text search that facilitates compliance activities by letting companies quickly find files containing sensitive data, such as passwords and Social Security numbers. Add to this support for geographically distributed environments, and it's easy to see why scale-out storage is essential for cost-effectively managing large-scale storage environments.
The final important ingredient of modern secondary storage environments is providing easy access to services required to manage secondary data. As the amount of unstructured data grows, IT can make things easier for storage administrators and improve organizational agility by giving application owners self-service tools that automate the full data lifecycle. This means providing a portal or marketplace and predefined service-level agreement templates that establish the proper data storage parameters. These parameters include recovery points, retention periods and workload placement based on a company's standard data policies. Secondary storage should also integrate with database management tools, such as Oracle Recovery Manager.
Clearly, distributed file systems and scale-out object storage architectures are a key part of modern secondary storage offerings. There is an evolution of secondary product portfolios to address the immense unstructured data storage needs of modern organizations in the digital era. So stay tuned, as I expect nearly all major data protection vendors will introduce scale-out secondary storage products over the next 12 to 18 months.
Choosing between NAS and object storage for unstructured data management
Managing unstructured data capacity growth
Three keys to effective secondary storage management
- Best practices for effective information management –SearchDataManagement
- Rethink data integration for the age of big data –SearchDataManagement
- The best way to begin an enterprise information management program –SearchDataManagement
- Big Data Challenges and Pitfalls –SearchDataManagement