Published: 12 Apr 2008
| It seems like everyone is talking about thin provisioning. It's no miracle cure, but the technology can do wonders.
But thin provisioning isn't new. Companies such as 3PAR and other smaller players have been selling it as a core feature of their product lines for a while. With the buzz around virtualization, this technology has suddenly become interesting to more players because it offers a new approach to how storage is provisioned.
Is thin provisioning truly a miracle technology? Not really. It's not a tool that can magically provide you with more capacity than you purchased. However, it does let you allocate only what's accessed while allowing the host to think it has access to everything it sees. For example, a 300GB logical unit is provisioned to a host. After creating a file system on it, only 100GB is used. At that point, the array intelligently figures out that the remaining 200GB isn't being used and keeps it in reserve. Yet the host happily continues to think it has access to the remaining 200GB.
Several vendors have deployed this technology in the form of a snapshot or copy-on-write feature for point-in-time copies. This allows customers to save big on storage purchases made for replica copies. The underlying premise is that change rates are never more than 20% to 30%. If you apply this concept to your primary storage, this suddenly becomes an interesting way to overprovision. But what happens if a host (or a set of hosts) suddenly needs more space? Again, there's a solution. If you pool all of the resources into a single bucket (often known as the resource pool), then any server or set of servers can borrow against its allocated resources on an as-needed basis.
The caveat is that not all of the servers sharing the pool can borrow space at the same time. Think of it like your local savings bank. The bank pools funds into a single bucket and invests them with the assumption that not everyone will withdraw all of their money at the same time.
So does this mean you can purchase less storage than you need? Not really. But it lets you manage your resources more effectively, rather than cater to storage requirements in an ad-hoc manner. You now have better control over allocation, and you can perform a trend analysis of how storage is consumed as a function of the global resource pool.
There's a certain school of thought that thin provisioning is a form of virtualization. That's because provisioning works, at a very basic level, as an atomizer that chops allocated storage into smaller chunks or blocks. These particles or chunks are then spread across the various components of the pool in an automated manner, thereby removing the traditional dependency of a LUN on a RAID set (often known as an array or RAID group), which is nothing but a group of drives clubbed together by a common scheme to read and write data on it. By introducing thin provisioning as a layer between a LUN and a RAID set, one can abstract the physical location of the data (i.e., virtualization) and make it easier to provide a set of mobility options around it. For example, Hitachi plans to introduce thin provisioning on virtualized third-party arrays. That means you could create a pool comprising logical disk resources from several arrays. Run out of space on one array? No problem; you simply add resources from other arrays into this pool and you have more space. It's almost like Virtualization 2.0.
An additional benefit of thin provisioning is performance. Each disk drive has an IOPS, response time and MB/sec rating. Create a RAID set and you have the combined effect of multiple drives working to reduce the numbers that a single drive can offer. Add cache and you get more improvement. (I know it's not that simple, but let's assume it is for comparison.) In a traditional provisioning scheme, if you exceeded the response-time rating of a RAID group, you had to resort to other aggregation mechanisms such as array or host-based striping, or concatenation. With thin provisioning, you suddenly acquire the ability to create a common pool of lots of RAID sets. When you create a LUN or pool, you're essentially spreading this LUN over all of these RAID sets and significantly improving the performance numbers over a traditionally provisioned LUN. This is good news for administrators of applications such as Exchange. They no longer have to spread databases over multiple storage groups to meet the IOPS requirements imposed by heavy usage. The same applies to storage administrators, as the task of checking on the performance of each and every RAID set is minimized to a large extent. This is because each RAID set, by virtue of it being part of a bigger resource pool, now gets a fraction of the IO overhead it would have received if it weren't part of the pool.
The biggest challenge is knowing where your data lives, and whether it can be tracked or recovered if there's a catastrophic component failure. In the case of a traditionally provisioned LUN, the boundaries of the LUN are well established along the cylinders of disks in a RAID set. Sure, you can have a disk fail. But let's face it, how many times do you have a protected RAID set fail? In the world of thin provisioning, the LUN is constructed and maintained in memory or virtual space. Not only is it distributed across multiple RAID sets, but recovery from a failure of the subsystem could be a daunting task. Fans of thin provisioning will likely dismiss this as fear-mongering but, in my opinion, it's a legitimate concern. Vendors should be compelled to provide reliable methods of recovery for thin-provisioned resources.
The other issue is that converting to thin provisioning isn't always easy. You might argue that if a host is using only 20% of its allocated storage, then the unused 80% should become available as free storage once a conversion is complete. That's easier said than done. For starters, the conversion process isn't that simple. Most vendors don't support an online or transparent conversion. Those that support this conversion depend on host-based tools to do so. More importantly, the use of block-level copy tools, such as volume manager mirroring, don't provide the necessary effect because they're designed to update every block on the target storage as a part of the mirroring or copy process. It doesn't matter if that block is unused on the source storage; if it's been allocated, it means it must be mirrored. The target storage then treats a written block as an "accessed" block and the effect of thin provisioning is lost. As a result, some vendors insist that to take advantage of thin provisioning, you need to use it for net-new storage or use conventional methods such as tar or cp to migrate the data.
Unfortunately, thin provisioning hasn't yet evolved to the level of content addressable storage where there's a timestamp associated with every block and it's "retired" depending on its last access time. Perhaps, someday, we'll have that functionality.
Thin provisioning isn't a revolutionary concept, but a novel (and promising) way to address allocation and performance. You must remember that it's not a magical solution to every problem in your storage world. When one of those turns up, I'll be sure to let you know.