Published: 01 Sep 2009
Getting the most out of what you already have isn't just smart, it might be the only way to keep your storage shop alive and well.
Data storage efficiency is often an elusive target for storage managers, but with declining budgets and unchecked capacity demands, getting the most out of your storage resources is more important than ever. Storage vendors say they feel your pain and can help you store more data for less cost than ever before.
But storage managers should press those vendors on the actual savings they'll see with their own data and environments. It's also important to understand that some space-saving technologies may require new monitoring tools or additional storage controllers to ensure they don't crash important applications. In addition, a new technology might not always be the answer; non-technical methods, such as pushing back on users' demands, may do as much to help keep storage budgets under control (see "Just say 'No,'" below).
|Just say 'No'|
Need to cut your storage requirements but don't have the time or money for fancy new hardware, software or consultants? Then try pressing your users on whether they still need all the disk space, copies or performance they insisted on when times were good, said Roger Cox, research vice president at Stamford, Conn.-based Gartner Inc.
"When money was no problem, people overprovisioned their storage," he said. If a user said an application needed half a terabyte, the storage administrator would provision a full terabyte so they wouldn't have to go through the painful process of expanding the volume if the app actually needed more. "Go back, revisit your provisioning policies" and cut back on some of that overprovisioning, Cox said.
Consider asking users to reduce the number of copies they require, or see if they can live with somewhat slower performance by crowding more data onto each drive. And if users insist they can't give anything up? Try charging them for excess copies, overprovisioning or gold-plated performance. With their own budgets strapped, that should quickly help impose some self-discipline, Cox suggested.
The amount of storage under management is growing approximately 40% per year, according to a March 2009 survey of 400 U.S. IT professionals commissioned by Symantec Corp. However, the survey indicated that storage budgets will grow only 15% to 20% in the next year, and 20% to 24% over the next two years. If Symantec's survey paints a not-so-pretty picture of the coming years, Storage magazine's own Purchasing Intentions survey indicates that the pressure is on right now. In that survey, also conducted in March, respondents said they'll add an average of 43 TB of new disk capacity this year, despite seeing their storage budgets dip by 1.9% vs. last year. This crunch is leading many organizations to consider using newer technologies, many made possible by the growing use of storage virtualization, to keep more data available while spending less money.
Here are some tips for using these new tools most effectively (see "Quick list: Efficiency options," below).
|Quick list: Efficiency options|
These technologies can help you run a more efficient storage shop by making better use of your already installed capacity.
Moving less important data to slower, and less expensive, storage media is a mature and well-established cost-cutting technique, with more than 75% of respondents to Symantec's survey archiving email and files or planning to do so. What's new is that many are combining archiving with data deduplication and other techniques for greater efficiency.
At Health Alliance Plan of Michigan in Detroit, Dan Trim's storage budget has been falling for the last eight years, but data archiving has cut storage growth rates from as high as 48% down to a much more manageable 14%. Trim, the health insurer's director of infrastructure technology, was pleased with how Symantec's Veritas CommandCentral Storage, among other tools, gave him "a deeper look" into how he was using his storage and "how to back up files over a certain age, get them off to tape, offsite and off my disk," he said.
Perry Fritz, enterprise operations manager at Rockline Industries Inc. in Sheboygan, Wis., estimates he has saved approximately $20,000 in disk arrays since he started archiving the company's email files with managed services provider Mimecast in June 2008. The paper goods manufacturer moved into archiving not to reduce its storage needs, but to speed email retrieval for legal or other purposes. It chose a hosted service because it was less expensive than purchasing archiving hardware for each of its three sites, and provided a central archive point for its 200 GB to 300 GB of email data. The savings on disk purchases, along with the disaster recovery (DR) capabilities, has been one of the "added benefits" of going with a managed storage service.
Compression and data deduplication
Compression is one of the oldest methods for saving space and data deduplication is one of the newest, but they're related and each has a role to play in holding down storage spending. Understanding how the technologies differ is the key to using each one most effectively.
Compression uses mathematical algorithms to simplify large or repetitious parts of a file, with different compression products aimed at different use cases and various types of files. Some storage shops use the compression capabilities built into popular operating systems such as Unix, or even low-cost utilities such as WinZip on Windows platforms. Later this year, NetApp will release compression features "covering all the platforms we now cover, including primary storage," said Chris Cummings, NetApp's senior director of data protection solutions.
Data deduplication eliminates duplicate patterns within a data store, and in ideal cases -- such as repeated backups of almost identical files -- vendors claim they can reduce data sets by 15:1 to 20:1. It's no wonder that 95% of respondents to the Symantec survey are at least discussing data deduplication, with 52% either implementing or having implemented it.
However, deduplication works best on data to which only minor changes are made over time (such as backups of lengthy business documents or engineering plans) rather than data of which only one copy exists, such as a CAT scan stored on a medical system. By contrast, most compression delivers at least 2:1 compression ratios "on almost any data set," sometimes with little or no performance hit, said George Crump, lead analyst at storage consulting firm Storage Switzerland.
In fact, Crump said, deduplication "loses value the closer it gets to primary storage" where there are usually fewer multiple copies. To prevent dedupe from slowing down disk access on primary storage, the deduplication would have to be done after the data arrives on disk, added Andrew Reichman, a senior analyst at Cambridge, Mass.-based Forrester Research Inc. "This will require swap space to write data un-deduplicated and then deduplicate it to a separate set of disk," he said. This "could eliminate the capacity reduction," he added, which is the whole point of deduplication.
Health Alliance Plan's Trim said he's seeing approximately a 50% savings in storage capacity with Symantec's Veritas NetBackup PureDisk.
Different vendors squabble over just where and how to use dedupe. Symantec, for one, is pushing a "dedupe everywhere" strategy, while NetApp's Cummings said he doesn't recommend it "for your tier 1, highly transactional, high IOPS database environment. But we do see it as being safe and having little or no performance impact" for storing virtual servers, tier 2 databases, file services and archiving.
For Chris Watkis, IT director at Grey Healthcare Group Inc. in New York City, data deduplication was an unexpected benefit from his purchase of a FalconStor Software Inc. Virtual Tape Library in 2007. His main goal was to speed backups and restores as the medical marketing company moved into more markets, created bulkier content such as video and held onto that content for longer periods.
But the deduplication reports from the FalconStor appliance showed how much redundant data was being eliminated before backup. Armed with that information and some off-the-shelf storage management tools, Watkis now regularly scans his servers for redundant files and has recovered 40% of the space on his 16 TB storage-area network (SAN). Those savings are critical since he had to cut his storage budget by 20% in fiscal 2009.
Joseph Stedler is senior engineer and Dallas data center manager at External IT USA Inc., a managed service provider based in Richardson, Texas. For data deduplication, he considered both a Data Domain Inc. appliance and Veeam Software's Veeam Backup & Replication software, which combines deduplication and backup for VMware ESX virtual servers. Despite its much higher cost, Stedler said, he went with the hardware-based Data Domain appliance because of its higher performance and the ability to replicate data among his various data centers.
The host-based deduplication provided by EMC Corp.'s Avamar has received "a lot of favorable feedback" from Gartner customers, said Roger Cox, research vice president at Stamford, Conn.-based Gartner Inc., as have target-side deduplication products from Data Domain, Diligent Technologies Inc. (which was purchased by IBM Corp. last year) and Quantum Corp. (which is OEMed by EMC).
David Floyer, chief technology officer and co-founder of the Wikibon project, an online IT support community, said NetApp's data dedupe is aimed at primary, rather than backup, storage as is the case with other vendors such as Data Domain, and can save 30% in the cost of disk. In calculating total ROI, however, Floyer warns that because the data must still be "rehydrated" to its original state before being used, customers still need enough storage controllers to assure the proper IO and bandwidth for critical applications. This lowers the overall cost savings, he said, from 30% to 15%.
Users need to look out not only for overall reduction ratios, but for how long it takes a product to compress or deduplicate data, and then return it to its original, readable state, said Greg Schulz, founder and senior analyst at StorageIO Group in Stillwater, Minn. Saving huge amounts of space isn't much good if you can no longer work within your backup or restore windows, he noted.
To use an airline analogy, thin provisioning lets storage admins "overbook" disk space by provisioning more disk space than an application is likely to use. The disk space is only actually occupied when the application writes data, leaving the unused capacity available to other applications, rather than sitting allocated but unused.
Just as the gate agent must keep a close eye on how many passengers actually show up, storage administrators need real-time monitoring tools so they know when to add more physical disk or expand the size of logical volumes if too much data shows up for "seats" on the array.
Using thin provisioning, External IT USA's Stedler has allocated 215 TB of space for his VMware servers, backed by only 5.5 TB worth of actual disk. He said he's very pleased with how DataCore Software Corp.'s SANsymphony provides real-time reports about "how much storage has been allocated … vs. how much has been claimed by a virtual machine," and how he can set thresholds for alerts when drives reach a certain utilization level.
Forrester Research's Reichman said NetApp has "decent capacity visibility with their Operations Manager tool that's designed to report on NetApp-specific storage," but faulted the company's SANscreen for lacking visibility to the file-system level. He also faulted EMC's Ionix ControlCenter and IBM's Tivoli Storage Productivity Center Suite as lacking the detailed reporting needed to support thin provisioning, and said "none of the big SRM [storage resource management] tools seem to have hit the nail on the head … and I think it's a big reason why users are slow to adopt thin provisioning."
Reichman and other analysts singled out smaller companies such as Compellent Technologies Inc. and 3PAR Inc. as providing such detail on their own platforms, helping drive faster adoption for them.
On the higher end, Gartner's Cox said he's gotten "good feedback" from customers using Hitachi Data Systems' Universal Storage Platform V and Universal Storage Platform VM storage controllers. In general, customers are reluctant to risk the availability of their high-end systems with new technologies such as thin provisioning, Cox said. "And in the Oracle world, where a lot of the high-end systems are installed," he said, the popular database precludes the use of thin provisioning because it pre-formats the disk pools itself.
StorageIO Group's Schulz recommends factoring in the cost of monitoring tools when calculating the benefits of thin provisioning, and warns against creating performance bottlenecks by forcing too many servers to compete for read/write access to too few disks.
Moving older or less valuable data to slower, less-expensive media is another time-honored way to reduce costs. Comparatively low-cost SATA drives can cost as little as one-tenth the price of high-performance Fibre Channel (FC) drives, while using far less power and offering much greater density than FC drives, Forrester Research's Reichman said. SATA drives can be particularly effective when used with other features such as thin provisioning and wide striping.
Storage Switzerland's Crump advises not getting bogged down in a tiered storage strategy. "Take the oldest data you have and just move it," he said. "If nobody has accessed it in two years, the chances of anyone accessing it again are between slim to none." By avoiding the cost of designing a more formal policy, he said, "you can handle the [rare] time when someone" actually needs an older file.
Within the next 12 months, Gartner's Cox said, vendors will start allowing customers to move not just volumes but individual pages within a data set to slower, less-expensive storage as those pages become less important or go longer without being accessed. Automated tiered storage as this "sub--volume" level, he said, "is going to have a big impact on storage efficiency."
Wide striping is a variation on RAID in which data is distributed among multiple disks, using only a relatively small amount of the capacity of each disk to maximize its performance. Wide striping is particularly effective as a cost-saver when used with relatively low-cost SATA drives, Forrester Research's Reichman said, compared to using higher priced Fibre Channel drives to deliver the needed performance.
"3PAR, Compellent and NetApp have all been doing wide striping for some time, and all of them are claiming the ability to derive high performance from SATA drives," he said. EMC and Hitachi Data Systems also provided disk pooling and wide striping when they released thin provisioning capabilities several years back, he said, but don't stress the use of SATA disks as much as other vendors.
In some "wide striping" solutions, the most frequently accessed data is stored automatically on the outer tracks of the disk so it's accessed most quickly, with other less-frequently used data stored elsewhere to make the most use of all the available capacity.
Rob DiStefano, IT systems manager at Earth Rangers Foundation, a Woodbridge, Ontario, non-profit organization, used this capability in the company's Pillar Data Systems' Pillar Axiom 600 to boost disk utilization to 80% vs. only 40% on older network-attached storage (NAS). Using Pillar's drag-and-drop interface, DiStefano said, he was also able to reduce his administration costs by a factor of 10.
Your mileage will vary
As with all products, vendors will choose the ideal scenario for calculating how much storage they can save. In real life -- in your shop -- efficiency efforts may yield somewhat less-impressive results.
In his Dallas data center, External IT USA's Stedler has seen only a 7.5 times reduction in data vs. the 15 to 20 times reduction promised by Data Domain. But he said much of the difference is due to variations in the type of data he's deduping and that "overall, we're quite happy with its performance."
Rich April, director of network engineering at Boston-based health care provider Harvard Vanguard Medical Associates, cited a recent conversation with a storage vendor that claimed it could reduce his storage needs by 60% to 80%. Those numbers assumed the data being deduped were mostly files; however, his older primary database environment "doesn't play nicely with these newer technologies," so he wouldn't get anywhere near those savings. He has, however, seen a 70% reduction in his backup data by switching from tape-based to disk-based backup of remote-office file shares using EMC's Avamar.
Among the factors to consider in deciding what savings you'll see are the amount and type of data in your environment; the capabilities of your existing storage network, controllers and arrays; and your requirements for application and backup performance. As StorageIO Group's Schulz said, when it comes to storage savings, "your mileage will vary" based on your specific environment. But with all the new and emerging ways to save on storage, the trip is well worth it.
BIO: Robert L. Scheier is a freelance technology writer based in Boylston, Mass. He can be reached at firstname.lastname@example.org.