Published: 17 Dec 2012
An effective QoS implementation helps tunes data storage to meet the specific needs of applications. New tools that offer more automation are emerging to help.
Practically every storage array vendor claims its box has quality of service (QoS) built in. To a degree, all these vendors are correct. The trouble, however, is how each one defines quality of service.
If you define QoS as the features built into your array then, based on that loose definition, you have QoS capability. I checked Wikipedia for a common definition. The term QoS entered our vocabulary through telephony and networking technologies. The one sentence that caught my eye in the Wikipedia entry was, "Quality of service is the ability to provide different priority to different applications, users, or data flows, or to guarantee a certain level of performance to a data flow." This is probably the best description that can be applied to storage.
The basic issue we've grappled with for decades is how to deliver the right storage performance to an application. Earmarking capacity with certain performance characteristics has now become commonplace. LUNs can be created using a variety of RAID types, and volumes can be carved out and allocated to applications. But if two or more applications are served from the same LUN, each application asks to be serviced and gets whatever is available. So, sometimes the application is starved for I/O and at other times it has more than enough, which makes application performance unpredictable. That problem is compounded by server virtualization because 10 applications may be running on the same server accessing the same datastore. Application performance becomes even more unpredictable.
There are a number of ways to deal with this. For starters, every storage array comes with tools that provide performance information. If you don't like what you see, you make changes. Take the application off that server completely and fire up a new one. See where the hot spots are for each application. Perhaps create a new LUN for that application. But in most cases the process is manual. Auto-tiering software has helped to automate the process. Early auto-tiering products would move an application that needed faster storage response to a higher tier of storage. Auto-tiering was further refined to work at a sub-volume level. That meant only the data that was hot was automatically moved to a higher tier, while "cooler" data was moved to a lower tier.
Unfortunately, most QoS implementation efforts stop right there. That still represents a giant step forward, but I think there's another key step, without which QoS remains incompletely realized. That final step is to guard against the following condition: There are three applications vying for I/O. From a business perspective, application three is the least important app but it's behaving in a rogue fashion and hot spots are everywhere. So it gets serviced using the QoS method described above, but that process starves applications one and two simply because application three asked for the services first.
In an ideal environment this should never happen. Each application should be prioritized at the outset with a minimum set of services (IOPs, throughput, latency and so on) assigned by the administrator. The storage array should have the intelligence built in to automatically deal with the constantly changing performance landscape. And, regardless of anything else, it must provide the minimum set of assigned resources to each app and allocate any excess available in the order of the established priorities. All of this should be managed by the array with no human intervention needed.
Right now we're close to this level of QoS sophistication, but only a few arrays have this type of intelligence and control built in. In a virtualized or cloud environment, where there could be tens, maybe hundreds of applications running as virtual machines, the only realistic way to allocate storage performance is via automation.
No discussion of a QoS implementation would be complete without mentioning that many other parameters can be brought under the umbrella of QoS. For instance, QoS might also include the level of data protection, the "breadth" of access (synchronously multisite, asynchronously globally). QoS might need to extend to cover the cache "contention" policies so cache usage can be prioritized. It should be noted that VMware's virtual volume (vVol) concept, introduced at VMworld 2012, has QoS implications. We'll take a look at vVols and their potential impact on QoS in a future column. Flash may also change how QoS is managed in hybrid and all-solid-state arrays (another topic for a future QoS discussion).
For now, when you're evaluating a storage array you should ask how performance provisioning is done, and also ask the vendor how you can get "micro" control of the allocation of scarce resources, and not just capacity.
About the author:
Arun Taneja is founder and president at Taneja Group, an analyst and consulting group focused on storage and storage-centric server technologies.