|Heterogeneous tiered storage|
In this approach, different technologies are used at each tier. Equipment costs are low, but moving data between tiers can be problematic.
Tiered storage is one of those topics I'm hearing a lot about lately. But actually moving to a tiered storage strategy can be tricky. How do you architect your solution? And how do you move data as its value changes?
When most people think of storage tiers, they think of the same thing: Different types of storage arrays at different price points that are designated to hold different classes of data. Imagine an infrastructure that contains the components as shown in "Heterogeneous tiered storage" on this page. This mythical company has four tiers of storage in use from four different vendors. Such a strategy has been proposed by many, and it makes some sense. The tiers are divided according to technical functionality, price and service levels.
If there's a benefit to mixing enterprise, midrange and internal storage, there's also clearly an underlying problem: How can you move data between tiers when the requirements for that data change? For example, the data collected in a lab might become business critical once a product based on it is developed. Moving to a different storage system usually requires a complete migration project, including both downtime and substantial risk of failure. What if there was a simple way to add data mobility to a tiered storage environment?
Virtualization products seem to answer this question, but few organizations have been willing to deploy them. Volume management products such as Veritas Volume Manager can help, too. But what if data migration wasn't required at all?
|Homogeneous tiered storage|
By varying protection of volumes within an enterprise array, you can achieve a more manageable storage architecture. Initial equipment costs may be higher than with the heterogeneous approach.
Tiering data on a single array
A single array strategy would be viable if you could adjust parameters of the storage within the array to meet different requirements. Some of these parameters include RAID protection (availability), replication (recoverability) and cost. Many arrays have the ability to replicate data, and some can vary the level of RAID protection. Different RAID levels yield different levels of availability and recoverability of data, and they are important to the cost factor as well.
A single enterprise-class array can easily scale to hold tens or even hundreds of terabytes of data, and can service dozens of hosts over a SAN. High-end arrays could hold the majority of a company's data, but the challenge of reducing costs remains. Lately, more businesses are considering creating different classes of storage and price points within a single array by varying the level of RAID protection, mirroring, replication and even drive types. And at least one vendor is bringing a specialty array to market to service this need.
The simplest way to vary the class of service is simply to turn off the features that enterprise arrays are known for. This is actually the standard practice for most organizations, whether or not they realize it. High-value data might have multiple internal point-in-time copies or business-continuance volumes (BCVs), while lower-value data does not. Similarly, only a small portion of an array's contents is normally replicated off site for disaster recovery purposes. These two methods create de facto tiers of service at vastly different price points than the nonmirrored or nonreplicated data in the same array.
Replication is expensive. The bandwidth to replicate a terabyte of storage out of state for one year could easily cost more than all of the hardware involved. So varying the replication of data is a sensible way to implement storage tiers. BCVs can also get costly if there are too many in use, and there's often a large part of enterprise storage set aside for use as BCVs even if they are never used. But turning off these features just brings ridiculously high prices down to simply expensive.
One recent approach to bringing prices down is to vary the RAID levels within the array. Arrays have long had the ability to protect some disks with RAID-1 and others with RAID-5. RAID-1 has 50% overhead, meaning that each disk has another used just for protection. RAID-5 has much less overhead, usually 20% or 25%, but sometimes as low as 11%. The overhead depends on the proportion of data volumes to parity volumes in a RAID set. A typical RAID-5 set containing four data disks and one parity disk would have 20% overhead and would be expressed as a 4+1 set.
HDS has long offered both RAID-1 and RAID-5 (3+1 and 7+1) in its Lightning arrays, but IBM Corp. only recently introduced RAID-10 in addition to the usual RAID-5 on its Shark. EMC Corp. used to be reluctant to offer levels of protection below mirroring. Its old RAID-S was roughly equivalent to 4+1 RAID-5, but was rarely used. Recently, EMC added parity RAID to the DMX series which is analogous to 3+1 or 7+1 RAID-5. They now claim that the majority of Symmetrix DMX systems use at least some parity RAID.
These arrays also have a variety of lower-performance bulk disk available. EMC and HDS claim to support dynamic conversion of LUNs between RAID-1 and RAID-5 protection. This allows an administrator to modify the protection without the downtime associated with migrating storage between different arrays.
Given a base cost of $50,000 per terabyte for raw storage, one terabyte of RAID-1 will cost $100,000, while a RAID-5 (4+1) set would cost just $62,500. A larger RAID-5 (8+1) set would lower this to $56,250, but would offer much less protection. This sounds great, but it's not the whole story. Fifty percent of the cost of an enterprise storage array is nonhardware, including software, maintenance and implementation services. Plus, there's a steep base price for the storage array itself before taking the cost of disks into account. For this reason, the savings of varying RAID levels can prove illusory. If physical disks make up just 25% of the cost of storage, saving 50% only shaves 12% off of the total cost.
One newer vendor rates a mention here. 3PAR, Fremont, CA, has a completely virtualized array--the InServe--that not only supports multiple RAID levels but can dynamically reconfigure volumes to use different levels. It allows an administrator to vary the protection, cost and performance of an individual volume. What makes this different from the leading arrays is 3PAR's low entry point and scalability. The InServe can start with just two controllers and grow to include eight. 3PAR promises a base price similar to the traditional midrange arrays and capacity well beyond the big boys of the enterprise space. They also have a feature called thin provisioning that can vastly reduce the storage required, but that's beyond the scope of this column.
Even if varying RAID protection levels are often less than rewarding, the benefits of keeping all of an enterprise's storage on a single type of array are significant. Put a network-attached storage (NAS) head on an enterprise array and all three tiers of storage can be hosted on a single storage platform. This makes management simpler, reducing the need for cross-training and heterogeneous management software. And the ability to seamlessly move data between tiers by varying the protection applied to it is tremendously important to maintaining a utility model. Who would want to change storage tiers if the change required an outage? "Homogeneous tiered storage" on this page shows what this environment would look like.
Your choice depends on your desired outcome. If you are concerned primarily with reducing capital costs, then the traditional tiered strategy is for you. Midrange arrays--especially with the introduction of ATA disks--will always beat enterprise arrays on price. But if your environment is dynamic or you require the simplest management possible, then the enterprise array approach is definitely worth a look.
- Taming Hadoop: Storage Tiering for Big Data –Western Digital