This article can also be found in the Premium Editorial Download "Storage magazine: iSCSI: Ready for prime time?."
Download it now to read this article plus other related content.
There's a certain school of thought that thin provisioning is a form of virtualization. That's because provisioning works, at a very basic level, as an atomizer that chops allocated storage into smaller chunks or blocks. These particles or chunks are then spread across the various components of the pool in an automated manner, thereby removing the traditional dependency of a LUN on a RAID set (often known as an array or RAID group), which is nothing but a group of drives clubbed together by a common scheme to read and write data on it. By introducing thin provisioning as a layer between a LUN and a RAID set, one can abstract the physical location of the data (i.e., virtualization) and make it easier to provide a set of mobility options around it. For example, Hitachi plans to introduce thin provisioning on virtualized third-party arrays. That means you could create a pool comprising logical disk resources from several arrays. Run out of space on one array? No problem; you simply add resources from other arrays into this pool and you have more space. It's almost like Virtualization 2.0.
An additional benefit of thin provisioning is performance. Each disk drive has an IOPS, response time and MB/sec rating. Create a RAID set and you have the combined effect of multiple drives working to reduce the numbers that a single drive can offer. Add cache and you get more improvement. (I know it's not
| that simple, but let's assume it is for comparison.) In a traditional provisioning scheme, if you exceeded the response-time rating of a RAID group, you had to resort to other aggregation mechanisms such as array or host-based striping, or concatenation. With thin provisioning, you suddenly acquire the ability to create a common pool of lots of RAID sets. When you create a LUN or pool, you're essentially spreading this LUN over all of these RAID sets and significantly improving the performance numbers over a traditionally provisioned LUN. This is good news for administrators of applications such as Exchange. They no longer have to spread databases over multiple storage groups to meet the IOPS requirements imposed by heavy usage. The same applies to storage administrators, as the task of checking on the performance of each and every RAID set is minimized to a large extent. This is because each RAID set, by virtue of it being part of a bigger resource pool, now gets a fraction of the IO overhead it would have received if it weren't part of the pool.
The biggest challenge is knowing where your data lives, and whether it can be tracked or recovered if there's a catastrophic component failure. In the case of a traditionally provisioned LUN, the boundaries of the LUN are well established along the cylinders of disks in a RAID set. Sure, you can have a disk fail. But let's face it, how many times do you have a protected RAID set fail? In the world of thin provisioning, the LUN is constructed and maintained in memory or virtual space. Not only is it distributed across multiple RAID sets, but recovery from a failure of the subsystem could be a daunting task. Fans of thin provisioning will likely dismiss this as fear-mongering but, in my opinion, it's a legitimate concern. Vendors should be compelled to provide reliable methods of recovery for thin-provisioned resources.
This was first published in April 2008