The latest trends and innovation in tiered data storage continue to center on increasingly sophisticated algorithms that can automatically move data between lightning-quick tiers of flash drives and lower-performance SAS and SATA disks.
Ashish Nadkarni, a research director in the storage systems practice at Framingham, Mass.-based International Data Corp., said vendors have analyzed numerous data points from storage systems deployed in the field to refine their algorithms and improve their automated tiering software to ensure efficient movement of data and optimal cost benefit.
In this podcast with TechTarget senior writer Carol Sliwa, Nadkarni discusses the advancements in tiered data storage, the types of workloads with which automated storage tiering can be used, the ways that flash and cache fit into an automated tiering strategy, and some of the problems encountered with automated storage tiering. He also offers up recommendations on how IT shops can best take advantage of the latest developments in tiered data storage.
Storage tiering is hardly new, but organizations now have more options for tiering their storage. What major advancements have you seen in the past year?
Ashish Nadkarni: Tiered data storage is now no longer considered optional and is considered to be more of a feature that most people will buy with the promise of cutting down your footprint [and] getting the tiers like flash and the slowest tier, like nearline SAS, to coexist. And the tiering software itself -- take [Fully Automated Storage Tiering for Virtual Pools] FAST VP from EMC as an example -- have now gained sophisticated algorithms that will intelligently move data in and out of the performance tier into the colder tiers automatically. Therefore, they reduce the manual resource burden on administrators that had been the case before with tiering.
Vendors are also trying to sell tiering as a way to reduce the overall cost footprint of the solutions. So, if you needed certain IOPS from a storage array in the past, without tiering, you would have had to buy a lot more drives and short-stroke them. Now with tiering software, there is intelligence. You can do the same thing with a lot less drives. You can also use the same software to manage tiering in mixed workloads, so you don't have to worry about the tiering software getting overwhelmed and such.
How popular is automated storage tiering at this point in time, and in what types of environments does it make the most sense?
Nadkarni: It's getting popular. Some vendor told me recently that 60% to 70% of their midrange solutions are sold with automated tiering in place. So, a lot more suppliers are shipping automated tiering enabled on their storage arrays, and [they're] enabling it by default on their storage arrays when they get deployed in the environments.
The promise is that these tiering solutions can work in mixed workloads if you have a combination of SQL Server databases and Exchange and virtual environments. Typically the vendor will tell you that the tiering software should be left alone, and that means [letting] the tiering software sort of self-learn and figure out what data types to place on what tier and [giving] you a certain percentage of the mix, meaning how much should be flash and how much should be left on SAS and such. And then the tiering software automatically figures out the best way to move data in and out. Day by day, more and more workloads are becoming acceptable to be placed on automated tiering software. Of course there are exceptions. If 80% of your data is always hot, then there's no point in having it sit on a tiering solution where only 20% is flash. However, [usually] that's not the case. In most environments, with structured, unstructured and semi-structured data types and applications that use those data types, this should be fine.
Do most people use flash with automated storage tiering these days, and how well does flash work in those environments?
Nadkarni: Most people do use flash. In fact, flash is what allows that tiering software to give you a blend of IOPS when you need them and an economical dollar per gigabyte when you don't need those IOPS. Flash is what allows you to give your storage array that performance-on-demand characteristic, and then when the performance is not needed, that data gets slowly funneled back to the slower tier. Without flash, unfortunately, you wouldn't be able to get the most out of the tiering software.
Where do flash cache and DRAM cache fit into an automated tiering strategy?
Nadkarni: Think of it as tier 0. If you have two or three tiers in your automated tiering setup, then flash cache and DRAM cache is often considered to be tier 0, where you get the most amount of performance from that tier. And the next tier then could be [multilevel cell] MLC or [enterprise] eMLC, which has a certain level of endurance to it and also a certain level of density to it. Then you would have your SAS drives and then your nearline SAS drives, which would be the slower and more persistent tiers.
When you need the IOPS on demand, that's when this cache comes into play, and certain types of persistent reads then get pushed into this tier and then get serviced by that tier. So, when you need IOPS the most, it gets serviced from the tier 0. When you don't need it, it gets pushed into a cheaper dollar per GB.
Have you heard about any problems with automated storage tiering?
Nadkarni: I have heard about problems with automated storage tiering in the context of the tiering solution not working for the application in question or the tiering solution being more of a burden on the application. The best example I can give you is virtualized desktops. When you have [virtual desktop infrastructure] VDI and when people switch on their computers, there is a burst of I/O. In the case of automated tiering, that solution may not get enough time to move all the data into cache before it gets serviced.
Keep in mind that an automated tiering solution is not the Swiss Army knife. It is not going to solve 100% of the problems that customers see in their environments. It is meant to solve most of the problems, and from that side, I believe automated tiering solutions do work. They're sort of a general-purpose solution that can work in 80% to 90% of the environments that are traditionally wasting a lot of storage because of IOPS requirements. [Automated tiering also works for environments that] are putting too much emphasis on flash as a tier and not enough emphasis on the fact that in a particular data set, you might still have 20% to 30% of data that is active and the remaining 70% doesn't always have to occupy precious space on a flash tier.
What advice would you offer to organizations on the latest developments you've seen in tiered data storage?
Nadkarni: There is always the inertia or the resistance to change status quo, and organizations have good reasons to do that because they've probably been burned by first or second revisions of tiering software solutions. But my advice would be not to hold back and to embrace the new versions of these new generations of automated tiering solutions almost in the same way that they've embraced other trends in the market. Those solutions are here to stay. Think of it in the overall economics of storage.
The second piece of advice would be to think of the automated tiering solution or software as a part of the overall storage strategy. So [as] you would look at storage optimization technologies like thin provisioning and compression and deduplication, think of automated tiering almost in the same way. All of those things should be taken together when you're examining your storage strategy.
And then the third piece of advice is -- and this is more of a tactical, operational thing -- don't try to override the automated tiering solution. I've talked with a lot of customers who started off with automated tiering and somewhere along the way during the deployment phase they realized that, 'Oh well, we shouldn't let this database be serviced from a slower tier,' and [they] had the inclination to pin some of the data in the faster tier. And, in fact, pinning the data, which is almost like overriding the automated solution, causes more performance problems than just letting the system learn from the I/O pattern to figure out the best way to service those I/Os.