Remember those light beer commercials back in the 1980s with competing contingents shouting “Tastes great!” and “Less filling!” at each other? The idea was that a beer could have fewer calories without sacrificing taste. Perhaps advocates of automated storage tiering (AST) are taking a similar approach: its two goals -- lower cost and higher performance -- seem to be just as diametrically opposed. Historically, if you wanted higher I/O performance (data throughput) you bought high-end Fibre Channel (FC) arrays and disk devices. If budget was a bigger issue, you gravitated toward IP storage and SATA drives.
In practice, most companies use both types of storage in an effort to match application throughput requirements with budget constraints. That effectively represents tiered storage, and how that tiering is managed boils down to whether the staff chooses de facto manual tiering or implements an automated system. Given the increasing complexity of data storage environments, data growth and the typically poor utilization of storage, it’s hard to imagine how manual tiering management is tenable for the long term.
A delicate balance: cost and performance
When storage vendors speak of their AST solutions, they all tout higher performance and lower cost. Given the dichotomy between lower cost and higher performance, one wonders whether they’ve somehow discovered a way to repeal the laws of physics. Fortunately for Newtonian science, the answer is no. In fact, AST can’t deliver both lower cost and higher performance simultaneously. What it can do is deliver the performance needed by the application at the lowest possible cost. Thus, it’s more a balancing act between the two objectives.
Storage tiering review
Most IT professionals generally understand storage tiering, but it’s worth a brief review of the concept. Tiers are defined predominantly by the performance characteristics of the underlying media. Solid-state drives (SSDs) and flash memory are referred to as tier 0; high-speed FC drives such as 15K rpm disks are tier 1; 10K rpm FC and SAS disks are tier 2; and less than 10K rpm SATA disks are tier 3. These aren’t absolute rules, but they’re typical tier differentiators.
Tiers are implemented in two different ways. The first is intra-array, in which a single array is populated with two or more media types. The second is inter-array, in which arrays with different media types are associated to facilitate data movement. It’s also possible to have both simultaneously in the same configuration.
Automating the tiering process
Neither storage tiering nor AST are new technologies. In fact, Hewlett-Packard (HP) Co. claims to have implemented automated storage tiering in 1996. Nevertheless, the adoption of AST has been relatively slow. That’s because the earliest implementations required a significant effort to classify data and develop the policies that governed data movement between tiers. Most often, data was moved based on age, which is rarely the best arbiter of value.
Current AST implementations use sophisticated algorithms that calculate the usage of data chunks ranging in size from a 4 KB block up to a 1 GB block, depending on vendor and settings. This calculation is done based on access demand relative to other chunks, as there’s no definition of “high demand.” Data can be elevated to a higher tier during high demand periods and demoted when demand lessens. The quality of the algorithm determines the value of the product and the size of the block determines workload suitability. Smaller block sizes are generally better for random I/O, while larger sizes are better for sequential I/O.
Automated tiering: Buying considerations
Shopping for automated tiering for your data storage environment? Keep these key points in mind:
- Understand your application’s data usage characteristics
- Examine management tools to keep the system tuned over time
- Determine the integration of the proposed automated storage (AST) tiering capability with existing tools and vendors
- Decide if you want a “set-and-forget” or customizable AST product
- AST is a true price-to-performance play measurable in the monetary savings of devices
Both established vendors and emerging vendors offer AST capabilities. Some of the newer vendors, such as Dell Compellent, have made automated storage tiering a cornerstone of their product architecture. With the company’s Storage Center product line and its Fluid Data Architecture, there’s only one array architecture and AST is an integrated part of it. Fluid Data Architecture data movement block size is a relatively granular 2 MB.
Similarly, for Avere Systems Inc., AST isn’t an optional feature in its FXT appliances. However, it adds the ability to use any network-attached storage (NAS) or JBOD array as tier 3 storage. Thus, Avere offers both inter- and intra-array tiering. In addition, Avere uses its own file system, which gives it an additional measure of control over data movement in its algorithm. FXT is a “set-and-forget” model that doesn’t allow user modification of movement policies, although tiers can be scaled separately to match workload changes.
For Greg Folsom, CIO at Arnold Worldwide, simplicity is the key issue. According to Folsom, Dell Compellent systems are “drop-dead easy” to install and manage. Arnold Worldwide, a Boston-based ad agency, uses a three-tier strategy with two different storage policies. “These things are so easy that even I can be talked through managing them when our storage manager is away from the office,” he joked.
Chris Elam, Arnold Worldwide’s senior systems engineer, began using Dell Compellent’s default automated tiered storage policies but tweaked them over time. Dell Compellent’s Enterprise Manager utility helped Elam identify usage patterns. “Enterprise Manager helped us to see exactly how data is accessed in the system. With this information, we created a tier 1-2 policy for some apps and a tier 2-3 policy for other applications. We’ve been using the system for more than four years and we haven’t had to change the policies in a long time,” Elam said. New volumes are simply assigned to one of the policies at creation time.
Solid-state storage complements tiering
Xiotech Corp. offers another example of a “set-and-forget” AST implementation. Xiotech’s Hybrid ISE product combines SSD and hard disk drives in a sealed 14.4 TB 3U container. Of the 14.4 TB, 1 TB is SSD and the rest comprises 900 GB 10K rpm SAS drives (tier 2). Controller-level software, called Continuous Adaptive Data Placement, automatically manages data placement from the moment of deployment. Although the company provides a graphical ISE Analyzer utility to highlight I/O activity, in practice a user can’t adjust any of the parameters or configuration. The company says it designed Hybrid ISE to never need tuning.
Among the vendors offering more configurable architectures, NetApp Inc. stresses the ability to scale performance and capacity separately. The firm’s Flash Cache (PAM II) product is analogous to tier 0 SSD in other product lines. Though it can support multiple tiers, NetApp said in many cases the tiers can be simplified to two: Flash Cache and either tier 2 or 3. That’s because they’ve found data tends to be either “hot” or “cold” and rarely in between. Buffer cache is used to buffer write activity to avoid performance degradation.
Data block movement size is the most granular at just 4 KB. Although this architecture may require more flash disk than other systems (10% to 20% of total capacity), the elimination of relatively expensive tier 1 hard disks and spreading cold data across more SATA drives can result in the same performance at a lower total cost. Moreover, NetApp combines AST with deduplication and compression on the spinning disk for even greater space efficiency. Because data is managed through the WAFL file system and Data Ontap, it doesn’t need to be “rehydrated” when being elevated from a lower tier to tier 0 as the data becomes hot. The same automated storage tiering capabilities apply across all NetApp product lines.
CERN, the European Organization for Nuclear Research in Geneva, uses NetApp’s Flash Cache on Oracle RAC databases. “Prior to using Flash Cache, we had to size everything based on IOPS regardless of storage utilization,” said Eric Grancher of the CERN IT department. “Now, we can optimize both IOPS and capacity. We have moved from expensive Fibre Channel drives to less-expensive SATA drives. This has resulted in a substantial savings for the organization.” Grancher has found the NetApp system to be very adaptive to workloads resulting in simple management. His experience has determined that overall performance is better when the flash memory is in the storage rather than in the servers. “It makes more sense to have the stable NetApp systems cache the data rather than the database servers, which are restarted more frequently for patching or updates. A data cache on the storage server is already ‘warmed up’ and so eliminates the inevitable periods of poor performance we would suffer with cold server-based caches after each restart,” he said.
EMC Fully Automated Storage Tiering (FAST) is another example of a more configurable system. FAST has an install wizard that allows you to implement default configurations for simple deployment, which EMC says the majority of users find sufficient in most cases for “set and forget.” Other users tap into FAST Tier Advisor, a utility that collects usage statistics over time. Those statistics can be used to apply optimized policies for specific applications. Users can also set the size of the data movement block from 768 KB to 1 GB, depending on whether the reads tend to be random or sequential.
EMC recommends that users start with approximately 3% of capacity in tier 0, 20% in tier 1 and 77% in tier 3. Tier Advisor will track usage and, over time, tier 1 should be minimized as little more than a buffer between the higher and lower tiers. In any event, Tier Advisor lets users optimize any of the tiers based on actual usage patterns.
Hitachi Data Systems’ (HDS) AST supports the same tool set across all product lines for inter-array tiering. It begins with virtualization to abstract and partition workloads. In fact, HDS recommends application and workload classification rather than data classification. “Organizations should avoid starting out too complex in their tiering strategy,” said Sean Moser, vice president of software at HDS. “Don’t use too many tiers and over-optimize individual applications.” Although HDS supports three tiers, as a practical matter the middle tier becomes a “shock absorber” between higher and lower tiers.
HDS offers a Data Center Management suite that includes configuration management, tuning management and tiered storage management. It provides alerts and a dashboard that gives details by volume, storage pool, service-level agreement (SLA) and peak periods. Using these tools, users can fine-tune the system over time. HDS can also incorporate other vendors’ arrays into the storage pool whereby older systems can be repurposed and used as a data archive. HDS can use spin-down drives for the archive tier to reduce power and cooling requirements.
HP is more traditional in its approach to automated storage tiering. Perhaps because some of its arrays come via a partnership and acquisitions, the AST capabilities vary between product lines. Its high-end P9500 systems, OEM units from HDS, behave very similarly to HDS’s AST implementation, and you can use the P9500 to virtualize other arrays.
HP’s 3PAR product line is a relative newcomer to AST, having rolled out those capabilities approximately a year ago. 3PAR supports three tiers, but it’s largely up to users how to configure them. HP recommends monitoring the applications for usage patterns and then determining what tiers at what sizes to implement. Its Adaptive Optimization tool is available to help with the monitoring and sizing of tiers.
HP’s x9000 scalable NAS uses its own AST as well. In this case, all policies are user generated. HP says automated storage tiering evolves from user policies to automation over time.
IBM’s Easy Tier product is supported on its Storwize V7000, DS8700, DS8800 and SAN Volume Controller products. Currently, Easy Tier supports two tiers, one of which must be solid-state drives. Once every 24 hours, the product analyzes performance metrics and generates a plan to relocate data appropriately. Data relocation occurs in 1 GB extents, which are migrated no more often than every five minutes to avoid performance interruption. Easy Tier is a function of the array and is a no-cost option.
Automated tiering market still developing
The good news about automated storage tiering is that the market is robust with many options. The bad news is that the options make comparing implementations rather bewildering. Jerome Wendt, lead analyst and president at DCIG in Omaha, Neb., has some practical advice for evaluating the appropriate solution. “First, users should match the performance needs of the application to the architecture of the product,” he said. “This includes understanding the size of the data block being moved, how often it’s being moved and how it’s moved between tiers.” Wendt further advises that file systems are fairly safe candidates for AST, but that Microsoft Exchange and databases should be approached more cautiously.
BIO: Phil Goodwin is a storage consultant and freelance writer.
- Taming Hadoop: Storage Tiering for Big Data –Western Digital