Sergey Nivens - Fotolia
Small World Big Data
Published: 06 Oct 2017
We all know flash storage is fast, increasingly affordable and quickly beating out traditional spinning disk for primary storage needs. It's like all our key business applications have been magically upgraded to perform 10 times faster!
In the data center, modern primary storage arrays now come with massive flash caching, large flash tiers or are all flash through and through. Old worries about flash wearing out have been largely forgotten. And there are some new takes on storage designs, such as Datrium's, that make great use of less-expensive server-side flash. Clearly, spending money on some kind of flash, if not all flash, can be a great IT investment.
Yet, as everyone builds primary storage with flash, there is less differentiation among those flashy designs. At some point, "really fast" is fast enough for now, assuming you aren't in financial trading.
Rather than argue whose flash is faster, more reliable, more scalable or even cheaper, the major enterprise IT storage concern is shifting toward getting the most out of whatever high-performance primary storage investment gets made. Chasing ever-greater performance can be competitively lucrative, but universally, we see business demand for larger operational data sets growing quickly. Flash or not, primary storage still presents an ever-present capacity-planning challenge.
A new 'big data' opportunity
The drive to optimize shiny new primary storage pushes IT folks to use it as much as possible with suitable supporting secondary data storage. As this is literally a new "big data" opportunity, there is a correspondingly big change happening in the secondary storage market. Old-school backup storage designed solely as an offline data protection target doesn't provide the scale, speed and interactive storage services increasingly demanded by today's self-service-oriented users.
We're seeing a massive trend toward interactive, online, secondary storage architectures. Instead of dumping backups, snapshots and archives into slow, near-online or essentially offline deep storage tiers, organizations are finding it's worthwhile to keep large volumes of second-tier data in active use. With this shift to online secondary data storage, end users can quickly find and recover their own data like they do with Apple's Time Machine on their Macs. And organizations can profitably mine and derive valuable insights from older, colder, larger data sets, such as big data analytics, machine learning and deep historical search.
If that sounds like a handy convergence of backup and archive, you're right. There's increasingly less difference between data protection backup and recovery and retention archiving. Backing up with these approaches can mean copying versions of changed data, such as file changes and incremental snapshots, from primary storage into the online secondary data storage. Recovery is then a matter of copying out -- also online and interactively -- instant, complete virtual images of the desired historical version that are maintained transparently to the end user.
If the storage is resilient enough through data protection algorithms -- such as N-way replication, erasure coding and high availability metadata services -- you can also reliably archive into it. The resulting archived data remains immediately available online, albeit not as high performing as top-tier flash. For big primary flash shops, this online archive capability translates into significant flash capacity savings. Essentially, the organization's global online data space becomes the sum of primary and secondary storage capacities.
First-class secondary storage
Several vendors, including Cohesity, Hedvig, Igneous Systems, Qumulo, Rubrik and Scality are positioning their massively scalable storage as this new type of secondary storage for data centers. Some of their products might have initially been designed for web-scale applications, next-generation hybrid cloud storage, big data lakes and other uses, but all of them are finding big opportunities with organizations tired of their aging, complex and narrowly featured data protection stacks.
These modern secondary storage vendors aren't trying to replace the primary storage footprint or compete with direct flash performance. They're actually encouraging back-end use of public cloud vendors -- for example, Cleversafe became IBM Cloud Object Storage. However, they are in some cases also offloading traditional primary storage workloads, such as file services, onto their so-called secondary data storage.
Many of these newer secondary storage offerings are built around core object storage platforms, such as Igneous, or massively scale-out parallel file systems, think Cohesity. Some are positioned as software-defined, and others are sold as appliances with optimizing hardware designs. While their implementation, constraints, net performance, scalability and other intended uses may vary widely, for secondary storage, they tend to offer similar capabilities, including the following:
- online high performance for both large file ingest and read IO;
- massive scalability to billions, if not trillions, of files and objects;
- hybrid tiering to other, even colder storage systems, such as Amazon Web Services' Simple Storage Service and Glacier;
- global, online metadata indexing; end-to-end global namespace; and distributed access across the full storage capacity;
- built-in storage analytics to help with capacity planning, usage and more;
- policy-based automation engines for fine-grained, incremental backup, archive, retention, and security and access policy control;
- direct online storage protocols for end-user access and supporting next-generation web-scale applications, including storage object protocols such as representational state transfer APIs, direct SMB share and NFS mount;
- superior data protection that could include features such as in-flight and at-rest encryption, sync and async N-way replication, distributed RAID and erasure coding, and high-availability metadata services; and
- high cost-efficiency even as the storage infrastructure grows, through linearly performing scale-out clustering, global inline dedupe, intelligent compression, and thin and virtual clones.
Think like a cloud provider
If making a flash investment go much farther isn't compelling enough, IT should consider the opportunity to become more like a cloud provider. We've been advising our IT clients to start thinking like internal service providers instead of just reacting to business demands as a cost center. Despite the secondary in the use case name, these new storage designs really bring cloud-like storage to the data center. Calling them secondary might even be a bit insulting.
While upgrading primary storage to flash speeds can alleviate a lot of IT pain -- for example, by eliminating poor storage performance support calls -- primary flash by itself doesn't transform IT's relationship with end users. However, these new secondary data storage designs are turning IT into business heroes because they transform how end users interact with storage, make a lot more data available online for business use and push that flash investment to the max.
What's hot in data storage technology
The role of hyper-convergence in secondary storage
Object storage extends to secondary storage systems
- 3 Common Cloud Challenges Eradicated with Hybrid Cloud –SearchStorage.com
- How to get the best value from Office 365 –ComputerWeekly.com
- Cloud Storage for Primary or Nearline Data –SearchStorage.com
- CW Benelux ezine February 2017 –ComputerWeekly.com