Thin provisioning can help you use your disk capacity much more efficiently, but you need to get under the hood a little to understand how thin provisioning will work in your environment.
Nobody wants to pay for something they're not using, but enterprise data storage managers do it all the time. The inflexible nature of disk storage purchasing and provisioning leads to shockingly low levels of capacity utilization. Improving the efficiency of storage has been a persistent theme of the industry and a goal for most storage professionals for a decade, but only thin provisioning technology has delivered tangible, real-world benefits.
The concept of thin provisioning may be simple to comprehend, but it's a complex technology to implement effectively. If an array only allocates storage capacity that contains data, it can store far more data than one that allocates all remaining (and unnecessary) "white space." But storage arrays are quite a few steps removed from the applications that store and use data, and no standard communication mechanism gives them insight into which data is or isn't being used.
What you should ask about thin provisioning
When evaluating a storage array that includes thin provisioning, consider the following questions, which reflect the broad spectrum of approaches to this challenge. Note that not all capabilities are required in all situations.
- Is thin provisioning included in the purchase price or is it an extra-cost option?
- Does the array support zero page reclaim? How often does the reclamation process run?
- What is the page size or thin provisioning increment?
- Does thin provisioning work in concert with snapshots, mirroring and replication? Is thick-to-thin replication supported?
- What does the array do when it entirely fills up? What's the process of alerting, freeing capacity and halting writes?
- Does the array support WRITE_SAME? What about SCSI UNMAP or ATA TRIM?
- Is there a VMware vStorage APIs for Array Integration (VAAI) "block zeroing" plug-in? Is it the basic T10 plug-in or a specialized one for this array family?
Storage vendors have taken a wide variety of approaches to address this issue, but the most effective mechanisms are difficult to implement in existing storage arrays. That's why next-generation storage systems, often from smaller companies, have included effective thin provisioning technology for some time, while industry stalwarts may only now be adding this capability.
Traditional storage provisioning maintains a one-to-one map between internal disk drives and the capacity used by servers. In the world of block storage, a server would "see" a fixed-size drive, volume or LUN and every bit of that capacity would exist on hard disk drives residing in the storage array. The 100 GB C: drive in a Windows server, for example, would access 100 GB of reserved RAID-protected capacity on a few disk drives in a storage array.
The simplest implementation of thin provisioning is a straightforward evolution of this approach. Storage capacity is aggregated into "pools" of same-sized pages, which are then allocated to servers on demand rather than on initial creation. In our example, the 100 GB C: drive might contain only 10 GB of files, and this space alone would be mapped to 10 GB of capacity in the array. As new files are written, the array would pull additional capacity from the free pool and assign it to that server.
This type of "allocate-on-write" thin provisioning is fairly widespread today. Most midrange and enterprise storage arrays, and some smaller devices, include this capability either natively or as an added-cost option. But there are issues with this approach.
One obvious pitfall is that such systems are only thin for a time. Most file systems use "clear" space for new files to avoid fragmentation; deleted content is simply marked unused at the file system layer rather than zeroed out or otherwise freed up at the storage array. These systems will eventually gobble up their entire allocation of storage even without much additional data being written. This not only reduces the efficiency of the system but risks "over-commit" issues, where the array can no longer meet its allocation commitments and write operations come to a halt.
That doesn't suggest, however, that thin provisioning is useless without thin reclamation (see "The enemies of thin," below), but the long-term benefit of the technology may be reduced. Plus, since most storage managers assume that thin storage will stay thin, effectively reclaiming unused space is rapidly becoming a requirement.
The enemies of thin
"I may need 500 GB or more for this application," the DBA thinks, so just to play it safe she asks the storage administrator for 1 TB. The storage admin has the same idea, so he allocates 2 TB to keep the DBA out of his office. This familiar story is often blamed for the sad state of storage capacity utilization, but is that justified?
In most enterprise storage environments, poor capacity utilization can come from many sources:
- Annual and per-project purchasing cycles that encourage occasional over-buying of storage capacity that may never be used
- Ineffective resource monitoring and capacity planning processes that obscure capacity requirements
- Incomplete storage networking that strands capacity out of reach of the systems needing it
- Disjointed allocation procedures resulting in assigned-but-never-used storage capacity
- Inflexible operating systems and file systems that make it difficult to grow and shrink as storage demands change
Thin provisioning can be effective in many of these situations, but it's no magic bullet. Organizations with poor purchasing and capacity planning processes may not benefit much, and all the capacity in the world is useless if it can't be accessed over a segmented SAN. But even the most basic thin provisioning system can go a long way to repurpose never-used storage capacity.
The thin reclamation challenge
The tough part of thin provisioning technology is reclaiming unused capacity rather than correctly allocating it. Returning no-longer-used capacity to the free pool is the key differentiator among thin provisioning implementations, and the industry is still very much in a state of flux in this regard.
The root cause of the thin reclamation challenge is a lack of communication between applications and data storage systems. As noted earlier, file systems aren't generally thin-aware, and no mechanism exists to report when capacity is no longer needed. The key to effective thin provisioning is discovering opportunities to reclaim unused capacity; there are essentially two ways to accomplish this:
- The storage array can snoop the data it receives and stores, and attempt to deduce when opportunity exists to reclaim capacity
- The server can be modified to send signals to the array, notifying it when capacity is no longer used
The first option is difficult to achieve but can be very effective, since operating system vendors don't seem eager to add thin-enhancing features to their file systems. Products like Data Robotics Inc.'s Drobo storage systems snoop on certain known partition and file system types to determine which disk blocks are unused and then reclaim them for future use. But that approach is extremely difficult in practice given the huge number of operating systems, applications and volume managers in use.
Therefore, the key topic in enterprise thin provisioning involves the latter approach: improving the communication mechanism between the server and storage systems.
Zero page reclaim
Perhaps the best-known thin-enabling technology is zero page reclaim. It works something like this: The storage array divides storage capacity into "pages" and allocates them to store data as needed. If a page contains only zeroes, it can be "reclaimed" into the free-capacity pool. Any future read requests will simply result in zeroes, while any writes will trigger another page being allocated. Of course, no technology is as simple as that.
Actually writing all those zeroes can be problematic, however. It takes just as much CPU and I/O effort to write a 0 as a 1, and inefficiency in these areas is just as much a concern for servers and storage systems as storage capacity. The T10 Technical Committee on SCSI Storage Interfaces has specified a SCSI command (WRITE_SAME) to enable "deduplication" of those I/Os, and this has been extended with a so-called "discard bit" to notify arrays that they need not store the resulting zeroes.
Most storage arrays aren't yet capable of detecting whole pages of zeroes on write. Instead, they write them to disk and a "scrubbing" process later detects these zeroed pages and discards them, so they appear used until they're scrubbed and discarded. This process can be run on an automated schedule or manually initiated by an administrator. And some arrays only detect zeroed pages during a mirror or migration, further reducing capacity efficiency.
Even if an array has a feature-complete zero page reclaim capability, it will only be functional if zeroes are actually written. The server must be instructed to write zeroes where capacity is no longer needed, and that's not the typical default behavior. Most operating systems need a command, like Windows' "sdelete --c" or something on the order of NetApp's SnapDrive, to make this happen, and these are only run occasionally.
Some applications, including VMware ESX volumes, do indeed zero-out new space and the ESX command "eagerzeroedthick" will even clear out space. Although certain compatibility issues remain, notably with VMotion, ESX is becoming increasingly thin-aware. The vStorage APIs for Array Integration (VAAI), added in ESX 4.1, includes native "block zeroing" support for certain storage systems. ESX uses a plug-in, either a special-purpose one or the generic T10 WRITE_SAME support, to signal an array that VMFS capacity is no longer needed.
Symantec Corp. is also leading the charge to support thin provisioning. The Veritas Thin Reclamation API, found in the Veritas Storage Foundation product, includes broad support for most major storage arrays. It uses a variety of communication mechanisms to release unneeded capacity, and is fully integrated with the VxFS file system and volume manager. Storage Foundation also includes the SmartMove migration facility, which assists thin arrays by only transferring blocks containing data.
Thin awareness in other systems is coming more slowly. Another standard command, ATA TRIM, is intended to support solid-state storage, but it could also send thin reclamation signals, along with its SCSI cousin, UNMAP. Microsoft and Linux now support TRIM, and could therefore add thin provisioning support in the future as well. They could also modify the way in which storage is allocated and released in their file systems.
Thin provisioning is not without its challenges, but the benefits are many. It's one of the few technologies that can improve real-world storage utilization even when the core issue isn't technology related. Indeed, the ability of thin provisioning to mask poor storage forecasting and allocation processes contributed to the negative image many, including me, had of it. But as the technology improves and thin reclamation becomes more automated, this technology will become a standard component in the enterprise storage arsenal.
Thin provisioning and TCO
Comparing the total cost of ownership (TCO) for enterprise storage solutions is controversial, with self-serving and incomplete models the norm for storage vendors. Before spending money on cost-saving, efficiency-improving technologies like thin provisioning, it's wise to create a model internally to serve as a reality check for vendor assumptions and promises.
A complete TCO includes more than just the cost of hardware and software -- operations and maintenance, data center costs and the expenses associated with purchasing, migration and decommissioning storage arrays must be considered. And it's a good idea to consider the multiplier effect of inefficient allocation of resources: Leaving 1 GB unused for every one written doubles the effective cost of storage. With end-to-end storage utilization averaging below 25%, this multiplier can add up quickly.
Such cost models often reveal the startling fact that storage capacity on hard disk drives (or new solid-state disks or SSDs) is a small component of TCO -- often less than 15% of total cost. But that doesn't mean driving better capacity utilization is a wasted effort. Eliminating the multiplier effect from inefficient utilization can have a far-greater impact on TCO than merely packing more bits onto a disk drive.
Consider the operational impact of thin provisioning, as well as its mechanical impact on storage density. Thin systems may require less administration because capacity can be allocated without traditional constraints, but that could lead to a nightmare scenario of overallocated arrays running out of capacity and bringing apps to a halt. The best thin storage systems are also highly virtualized, flexible and instrumented, allowing improved operational efficiency and high utilization.
BIO: Stephen Foskett is an independent consultant and author specializing in enterprise storage and cloud computing. He is responsible for Gestalt IT, a community of independent IT thought leaders, and organizes their Tech Field Day events. He can be found online at GestaltIT.com, FoskettS.net and on Twitter at @SFoskett.