Future of data storage: Addressing traditional architecture problemsDate: Sep 27, 2013
Even with a growing number of new data storage technologies in the virtual space, traditional storage systems remain prevalent. To learn more about the future of data storage, Editorial Director Rich Castagna sat down with Phil Goodwin, principal architect of Cognizant's IT infrastructure services group, at the Storage Decisions seminar in Chicago. In this video, they discuss the problems with traditional storage technologies, how they're being addressed, and what the future of data storage looks like on the horizon.
There's been a lot of talk about getting deduplication working with primary storage. What's the status of that, and why has progress been so slow?
Phil Goodwin: Well one thing that's important to clarify is that when we talk about dedupe, organizations like Data Domain kind of set a standard when they said, "We can get 20:1, 32:1, 50:1 types of compression."
You're simply not going to get that kind of compression in your traditional storage or online systems. If you look at what NetApp does, I think they guarantee a 2:1, and you might get better than that, but I think they guarantee no more than 2:1.
The practical limitation to it really is the rehydration of the IOPS that are necessary to pull that storage back. An analogy might be a highly normalized database. The nice thing about a normalized database is you really shrink in size. The bad news is to pull up an individual record, instead of going to that one record and pulling it up with a single I/O, now maybe you need two, three, four, 10 different I/Os in order to get that single record. It's the same issue when you talk about deduplication, so you're putting more IOPS and performance pressure on your array in order to take advantage of that deduplication.
When we look at storage alternatives, most of them seem to be addressing capacity. Isn't performance also an issue?
Goodwin: Yeah, and when you're talking about performance, there are really two things: There's the IOPS, the I/O capability of the back end, but there's also the throughput performance of the front end. Where we're really seeing the problem manifest is more on the IOPS back end. Traditionally the way you got more performance out of an array was to put more devices in it, because the more spindles you have, the more IOPS you have. But what that led to was really gross inefficiencies in terms of capacity because a lot of large-scale organizations would be 20%, 30%, 40% utilized on a capacity basis, which means their cost per gigabyte is two or three times what it really should be. And when you're talking about petabytes of information, you're talking about millions of dollars in management costs that are being wasted in that kind of environment.
So IT organizations have started to take advantage of these wonderful high-capacity devices -- the 1 TB, 2 TB, 3 TB types of drives, which really shrink the density of the arrays. The bad news is if you look at it on an IOPS per gigabyte basis -- if you took, for example, just to make the math easier, a 400 GB SAS drive, you can get about roughly 200 IOPS out of that drive. So if you had a terabyte, let's say you would be able to have about 1,000 IOPS out of five of those drives. If you take a 1 TB SATA drive, you're really only getting about 75 IOPS out of that.
So you have an inverse relationship between the capacity of the device and its performance, and we are starting to see some skewing. So what I recommend to my clients is to look at automated storage tiering -- have a thin layer of solid-state drives at the top, have that automated software that has the ability to move hot data from your inexpensive, high-capacity disk up to the high-performance disk, and then back once it cools. And I think that's how you really can get the best of both worlds.
The way data is protected and stored on systems is also being re-examined now. Is something like erasure coding likely to replace RAID?
Goodwin: Yes interesting, is RAID dead? Probably not yet, or at least not in the immediately foreseeable future. But when you're talking about RAID these days, especially with the really large devices, you do need to go to a double-parity device, or in some cases triple-parity, or there are different scales and you can have as much parity as you really want.
That introduces two problems. One is the parity overhead that's associated with it in performance, but also you now are putting more and more capacity in what I would term a nonproductive environment where the only thing that a parity drive is doing is sitting there waiting for something to fail. Now, it inevitably will, which is why you do it, but you may have 20%, 30%, 40% overhead of nonproductive capacity.
So what we are seeing is more object-based movement. The erasure method that you mentioned depends upon more replication and substantiation across systems rather than a parity type of environment
So what are some other new developments that you think we'll see in the future of data storage systems?
Goodwin: One of the drivers is virtualized server environments, and what I think I'm seeing more and more is a move toward all solid-state arrays. Not too long ago I wrote a piece for solid-state arrays, and I thought that I'd done a pretty good job of investigating the vendor community out there. Then I came to Storage Decisions, walked onto the floor, and there were five vendors I had never heard of. So suddenly we have a very dynamic burgeoning market for those solid-state arrays.
I think the problem that they're going to solve is really one of data motion. Because if you think about a virtualized environment, it's wonderful that you can vMotion systems and applications from one data center to another data center, but that data has got to be staged somewhere. So I see the market really bifurcating where we're going to have all solid-state arrays in the data center to get the performance for those applications where we have a lot of data access. But then [it becomes] more of a hybrid cloud environment where we're going to take those high-capacity devices and move them off into the cloud. Then [we'll] have hybrid devices that are able to access [those devices] between the data center and the cloud environment. I see that architecture being more prevalent in the coming years.
About the expert:
Phil Goodwin is a senior manager and principal architect in Cognizant’s IT infrastructure services group, where he assists clients in the development of adaptive storage architectures, storage management best practices, backup and recovery, disaster recovery and data archiving.