Prime time for secondary storage

Do you want to improve data protection and make better use of primary storage? Creating a layer of so-called second-tier disk is definitely worth investigating.

This article can also be found in the Premium Editorial Download: Storage magazine: Adding low-cost tiers to conserve storage costs:

iSCSI inspires low-cost storage networks
There's been a stream of iSCSI products since the ratification of the iSCSI protocol in early 2003. Managers can now choose viable storage products from multiple vendors. iSCSI brings block-level SAN technology to smaller shops.

iSCSI enables block-level data delivery over any IP network. With an iSCSI initiator on the server, the system can send block-level data over the IP network to a storage device. Because Microsoft provides an iSCSI initiator as part of Windows "you've got SAN enablement being given away with the operating system. You don't even have to buy any [additional] hardware," says David Dale, chairperson of the Storage Networking Industry Association's IP Storage Forum.

The poor performance iSCSI was once criticized for no longer presents a problem in an era of gigabit and multigigabit Ethernet networks and fast processors. Yes, IP adds overhead, but "IP networks today are plenty fast," Dale says, even without the help of TCP/IP offload engines. iSCSI performance problems are more likely to result from disk drive actuators unable to pull data off the spindles fast enough, he says.

With iSCSI, midsize shops and satellite offices of large companies can achieve the benefits of consolidated networked storage. "Fibre Channel was the first phase of consolidation for enterprise mission-critical applications. With iSCSI, we can have a second phase for Wintel servers, which will be used for file consolidation," says Michael Peterson, president, Strategic Research Corp.

Mesirow Financial, a Chicago-based diversified financial firm with 800 employees and 11 offices found itself with a large number of Windows servers and the problem of administering DAS for each. "A SAN was too expensive," recalls Jay Walusek, vice president of server administration at Mesirow. After Microsoft added iSCSI initiators, the company researched an iSCSI SAN, settling on an iSCSI appliance from EqualLogic with 14 SATA disk drives. The appliance "virtualizes the storage for us and provides data replication at no additional cost," he adds.

Mesirow's next step is to add a second iSCSI appliance in a suburban office and run a gigabit Ethernet connection between them. "That will allow us to replicate data off site," Walusek says. The company also wants more SATA arrays for disk-to-disk or disk-to-disk-to-tape backup. Making this all possible is iSCSI, asynchronous IP connectivity and SATA disk arrays. "We get a fully populated, redundant iSCSI SAN for about $40,000 to $50,000," he says.
By definition, the word "secondary" suggests some degree of inferiority, but in a storage area network (SAN), secondary storage can bring big savings and smoother operations. Toss out the derogatory inferences--lower cost disks with less than breathtaking performance are playing an increasingly important role in the storage architecture.

Lower cost, of course, can be relative. Although ATA and Serial ATA (SATA)--i.e., "cheap disks"--often come to mind when secondary storage is the topic, the term doesn't necessarily define a product category.

Depending on a company's business environment and the currently installed equipment, secondary storage may just as easily be an older network-attached storage (NAS) box or a SAN array, basically any storage device that performs at a lower level than what's required for critical applications. Of course, it's prudent to keep an eye on TCO; maintaining older, lower performing storage may actually cost more than simply replacing it.

For most companies, placing a second class of disk storage between their primary storage and their archival tape library is the first step toward a tiered storage environment. Pair second-tier storage with a second-tier network--iSCSI--and the vision of a storage network architecture that can meet most of the cost/performance scenarios in the enterprise begins to take shape (see "iSCSI inspires low-cost storage networks"). But to effectively deploy secondary storage requires an understanding of the nature of the data, the data's value, how often it's used and how frequently it's accessed.

Add to lower-cost the notion of migrating data from primary storage, and this new layer of disk extends the old idea of hierarchical storage management (HSM). Instead of HSM's nearline layer--usually easily retrievable tape or writable optical--secondary disk storage brings instant online access and the ability to control whether data can be rewritten or not.

Why tier?
There are two motivations for adding secondary storage to an environment: improving data protection and making more cost-effective use of installed storage systems. From an applications perspective, these two criteria break down further, addressing more pecific storage activities:

  • Creating a more effective disaster recovery plan that allows data to be copied and recovered faster
  • Dramatically improving the backup process to gain freedom from backup window constraints
  • Making better use of primary storage by moving less-frequently used or less-critical data from expensive arrays
  • Maintaining older information in an easily accessible form to satisfy business and regulatory requirements
The concept of migrating data isn't new. HSM has been used in the mainframe world for decades, but in open-systems environments, migrating data from one type of array to another has heightened the importance of secondary storage. By using online storage resources, secondary storage extends the HSM concept.

"The majority of people up until recently have always treated all their storage the same," says Nancy Hurley, a senior analyst at the Enterprise Strategy Group (ESG), a Milford, MA-based analyst firm. But by moving data from one class of disk to another based on business criteria, companies have taken a significant step toward an information lifecycle management (ILM) system. Most companies first consider a secondary storage scheme to improve their data protection, adds Hurley.

Jim Geis, director of storage solutions at Forsythe Technology, a Skokie, IL, consulting firm, says that for many organizations, secondary storage provides a way to create multivendor shops. Multiple vendors typically meant multiple tools to manage storage, but Geis says that the maturation of management tools has made mixing vendor products much easier. Regulatory compliance has also forced storage shops to consider secondary storage, according to Geis. He says that because some regulations require almost immediate access to and a quick recovery of data "tape is not really an option."

SATA/ATA spurs interest
Clearly, more than a few planets had to align to put secondary storage in the orbit of storage managers, but the emergence of low-cost disks may be the prime mover. ATA and SATA disks are clear-cut alternatives to higher performing, more reliable--but much more costly--Fibre Channel (FC) storage.

At last spring's Storage Networking World conference, International Data Corp. (IDC) predicted that by 2007, 22% of all enterprise drive purchases will be SATA, accounting for 41% of the total terabytes purchased that year. This indicates growing interest in SATA, which is expected to account for less than 10% of 2004 drive purchases. IDC's prediction underscores the attraction of SATA/ATA disk, which today offers higher capacity than FC disks for 30% to 50% less cost.

The reliability issue is knottier. Regardless of the performance level required, any corporate data that's retained has intrinsic value and must be protected. One reason SATA disk prices can be kept so low is that the drive manufacturers do less testing on them than on typical enterprise-class drives. Reliability uncertainties may be the greatest impediment to SATA's acceptance. Forsythe's Geis says his clients' attitude about SATA can best be summed up as, "SATA is still wait and see."

Large storage vendors, such as Hitachi Data Systems (HDS) and Sun Microsystems Inc., are taking steps to help improve the reliability of SATA disks, or at least to take some of the unpredictability out of their failure. HDS' recent announcement of its Serial ATA Intermix Option allows the mixing of SATA disks with FC disks in the same array. HDS is taking a number of steps to improve the reliability of its SATA disks, including doing a read verify of every byte written to disk and continuously monitoring the disk with head sweeps.

Chris Wood, a director of technical sales and marketing at Sun, says SATA reliability is a "very significant issue." Wood says that when a disk fails, it usually occurs during its first 30 days of use, and that failure rate is higher when the disk manufacturers perform a shorter burn-in cycle, as is often the case with SATA disks. So, Sun adds a longer burn-in cycle and provides monitoring tools for SATA disks in service. For the future, the company is also looking at installing extra hot spares in every array that can be turned on when needed.

Most SATA arrays will be RAID protected, which will mitigate many of the reliability concerns, providing ample insurance if a drive fails. A second drive failure while the first is rebuilding could be a problem, but vendors are already addressing this somewhat unlikely occurrence. Network Appliance Inc. introduced double parity RAID (RAID-DP) with version 6.5 of its Data Ontap operating system. RAID-DP adds a second diagonal parity stripe and another parity disk to a RAID set to further safeguard the disks.

Another approach to improve reliability involves adding dual-port capability to single-ported SATA drives to provide redundant data paths. This technique is used by Engenio Information Technologies Inc. (formerly LSI Logic) in its SATA storage systems. Hewlett-Packard Co. (HP) is taking a similar tack with Fibre-Attached Technology Adapted (FATA), which it developed with Seagate Technology. FATA provides a dual-port FC interface for SATA drives, which improves reliability and allows mixing low-cost and FC drives in a single HP EVA array. These enhancements may kick up the cost of SATA disk systems a bit, but they will still offer substantial cost savings over their FC counterparts.

What's driving secondary storage?
There are many factors that contribute to the feasibility of adding secondary storage in a tiered architecture:

Growing data stores on primary storage
Continuing high prices for high-end storage vs. lower priced ATA and Serial ATA (SATA) disks
Greater capacities available with ATA/SATA disks
future generations of SATA will match or surpass Fibre Channel performance
Backup inefficiencies using tape
Regulatory compliance
The need to better control data management
Business continuity/disaster recovery
Availability of low-cost connectivity
Virtualization makes it easier to mix arrays and add lower cost disks

Making data safer
Today, secondary storage is most often used to bolster data protection activities. Traditional backup to tape has significant shortcomings, most notably the time required for daily and weekly backups and the difficulties of recovering archived data from tape. With far greater ease of use and faster access speeds, secondary storage is an ideal remedy for backup woes.

Industry statistics support the practicality of inserting some form of disk-based backup into typical backup scenarios. Most analysts say that approximately 75% of data that needs to be recovered is new, having been accessed or created within the last 72 hours. They also note that approximately 90% of recovered data is two weeks old or last accessed within that period. "Statistically, most people don't ever recover data after 21 days," says ESG's Hurley.

Those statistics help make a good case for using secondary storage. Because of the low cost of secondary storage, some companies have opted to eliminate tape altogether and back up directly to disk. Secondary storage system prices are competitive with many tape backup devices, albeit disk-based systems don't provide the kind of portability that tape does for storing backup copies off site. D2D backup does, however, compensate with speed--for both backing up and recovering data. Transportability of backup disks is getting more attention, with new products such as Spectra Logic Corp.'s Spectra RXT, a removable tray that holds up to 1TB of SATA disks in a ruggedized case that can be mounted in the company's Spectra T950 tape library.

At MIT's Lincoln Laboratories, Lexington, MA, the storage group uses low-cost Winchester Systems ATA arrays to speed the backup of an Oracle database. "By copying the files to the ATA array, we can cut down the backup time significantly, from 18 hours with tape to 1.5 hours with disk," says John Riopel, systems manager.

Such D2D backup is becoming increasingly attractive as the cost of ATA disk drives steadily drops. The economics of ATA make a big difference to storage managers like Tom Schultz, chief engineer at the Enterprise Imaging Group of Partners Healthcare, Boston. "I can get a 1TB ATA disk array from a midtier vendor for $4,000. An FC SCSI array would cost as much as two to five times more," he reports.

Partners is able to buy sufficient ATA disk to store three years worth of compressed images (approximately 25TB of data) before it is archived to tape. With Fibre Channel SCSI, the imaging group could only afford to keep 18 months of data online before going to tape.

Carol Braden, manager of data management for Trilegiant Corp., a membership club marketing firm based in Norwalk, CT, recently installed ATA disks in the company's EMC Corp. CX600. The new disks will be used to do point-in-time backups of an Oracle database using Oracle's Recovery Manager. "Backup will run faster and have less impact on your users," says Braden. "Any restores you have to do for local outages will be a lot faster because you don't have to go to tape." Trilegiant's setup uses the ATA disk as a staging area, with the backup data on disk eventually moving to tape.

The proliferation and enhancement of snapshot technologies also gives secondary storage a boost. This provides the dual benefit of quicker initial backups and easy access to saved data, while easing the constraints of traditional backup windows. Snapshot software that works across vendor and product lines is widely available, making it easier to mix and match storage devices according to function. Frequent snapshots of production data can be saved on secondary disk before backing up the data to tape for long-term archival. This makes it easy to satisfy data restore requests that typically occur when the data is still fresh, and the backup to tape process can run at any time without hindering operations.

Even without using snapshots, secondary storage can measurably speed up backups. Aetna, the insurance giant based in Hartford, CT, is using disk pooling and NetBackup to stage its backups for its Unix environment. Aetna uses approximately 10TB of what Nancy Guerin, head of storage management for mainframe and Unix servers, describes as "non-enterprise class" storage on an EMC Clariion, although it's still part of its overall FC environment. "Looking at the price differential," says Guerin, explaining the company's decision to eschew ATA or SATA disks, "it wasn't worth giving up some of the reliability points." Guerin adds, however, that they will consider ATA disks when they look at e-mail and database archiving.

Placing value on data
The key to any data archiving plan is to understand the nature of the data, so that when data is moved from one class of storage to another, the level of service will still be acceptable. The data itself will dictate how it may be archived, but some criteria to keep in mind include:

Application. The type of application will bear strongly on how its data may be archived; critical applications, such as CRM, may require maintaining data on higher performing primary storage.
Frequency of use. Generally, older data is infrequently accessed and is less important.
Criticality to operations. Even if the data isn't associated with a key corporate system, its availability may still be critical to certain business applications.
Expiration dates. Some data fades away naturally; other data may need to be retained for certain periods of time.
Regulations. Legal regulations may dictate what data must be retained on storage systems that provide fast and easy access.

Using disk more efficiently
The inexorable growth of corporate data has prompted many storage managers to move data to secondary storage to ease the strain on primary storage systems. The obvious benefit is that the more expensive primary storage is freed up to accommodate the applications requiring that class of storage. Reclaiming disk space that's being used for less-than-critical applications or to hold infrequently accessed data will help delay new disk purchases, or even avoid them altogether.

Nielsen Media Research of New York employs a sophisticated tiering system to allocate much of its vast amount of installed storage which, all told, adds up to about 1.2PB. Robert Stevenson, a technology strategist for Nielsen in its Oldsmar, FL, operations facility, says the company has three tiers of storage and is currently developing a fourth. The tiers are defined by price per gigabyte and matched to applications based on the level of service required. "Each of those tiers has different storage price ranges," says Stevenson.

The top tier costs users approximately $20 to $40 per gigabyte; tier two is priced at $15 to $30; and the third tier is $10 to $20 per gigabyte. "The application dictates the type of tier, but there is some blur," says Stevenson, "because tier one and two tend to be pretty similar in terms of throughput" until the number of host servers rises.

Relegating an application to secondary storage can be tricky, sending storage managers down a perilous path fraught with office politics because it requires setting a value on the application and its data. But placing all application data on primary disk poses an even greater risk of paying far more for storage than necessary. By putting the costs up front as Nielsen is doing, it shifts much of the burden of the which-disk decision to the business units that ultimately should be able to make the best decision based on the company's interests.

Another approach to matching data value to disk cost involves the use of archiving applications. Database archivers such as Princeton Softech's Archive and OuterBay's LiveArchive delve into databases and, using preset policies, identify data that can be moved to secondary storage.

Thinning out the data in databases stored on primary disk yields three main benefits:

  • Primary disk space is freed, providing "growing room" for the application or other applications
  • Purchases of new primary disk can be avoided or delayed
  • The database applications should perform better
The key factor is the assumption that the less frequently used data can be served adequately from lower performing disk. A large Pacific Northwest insurance and investment firm that's using data aging in their tiered environment recently added a Nexsan ATAboy array for secondary storage. "The Nexsan box is used solely as a repository for archived or infrequently accessed files," says Jeff Woodard, a storage architect/designer for the insurance company. The firm is using Arkivio Inc.'s auto-stor, a storage utilization and data management application, to move user files to the Nexsan device. "It's unstructured data; it's pretty much everything you would expect users to be putting in network shares--office files, dot-xls, dot-docs, pdf files."

Woodard adds that Arkivio has been easy to use, and they had to do little more than tailor policies by choosing from auto-stor's options. And to ensure that the data is protected, no files are moved to secondary storage unless they've been previously backed up. "So far, it's been right on target with what I projected," says Woodard regarding the effectiveness of the archiving project.

Using data aging as the criterion for use of secondary storage assumes that data loses value over time and, therefore, won't be accessed as often. But old data--even if it's infrequently accessed--may still be a valuable reference resource for business units. (See "Placing value on data") Data warehouses are good examples of where data aging and access frequency may be poor criteria for migration to secondary storage. Guerin notes that Aetna's data warehousing application is "a highly parallel data warehouse environment where performance is critical." She says the company is not considering secondary storage because "less frequently accessed is hard to determine in those environments."

While the current crop of archiving applications adds standardization and automation to archiving to secondary storage, true automation tools are still to come. In the meantime, a solid understanding of the data that you're working with is the most important ingredient to an effective ILM/secondary storage environment.

All in one box
One of the more interesting developments in secondary storage is the trend toward putting primary and secondary disks in the same array. "It's absolutely happening," says Sun's Wood of mixing primary and secondary storage in a single box. "There's one thing to manage--one consistent interface, not two."

HDS allows the mixing of FC and SATA disks in one Lightning subsystem, and EMC and HP have similar offerings. "The idea of being able to tier within an array is interesting," says ESG's Hurley. "You're not forcing people to buy two separate solutions." She also cites startup Compellent's Storage Center array which, along with an innovative system of allocating disk only when applications write data, allows mixing of FC, iSCSI and SATA disks. Moving data among unlike storage devices can still be difficult, so having different classes of storage in a single box makes tiering storage far easier.

At least one company is taking the mixed box concept a step further, according to Hurley. She says that Pillar Data, although it hasn't announced a product yet, is rumored to be working on technology that tiers storage on a single disk. Pillar's technology is based on the fact that data on the outside edge of a disk can be accessed faster than the data on the inner sections. "They've actually figured out how to tier at the component level," says Hurley.

Getting started
Successful secondary storage implementations start with a solid plan. Forsythe's Geis says you need to have a good grasp on the driving force behind your secondary storage plans: "Is it cost? Is it management?" He adds that other issues must be considered, too, such as how the secondary storage will integrate with the current infrastructure and management system and to ensure that it works the backup and recovery already in place.

Secondary storage has a better chance of being a cost or time saving factor if the software associated with it is policy-driven and as automated as possible. And as SATA disks get faster and more reliable, secondary storage will continue to grow. There will always be a place for high-priced FC drives, but secondary storage is no longer a second-class citizen in the data center.

This was first published in August 2004

Dig deeper on Tiered storage

Pro+

Features

Enjoy the benefits of Pro+ membership, learn more and join.

0 comments

Oldest 

Forgot Password?

No problem! Submit your e-mail address below. We'll send you an email containing your password.

Your password has been sent to:

-ADS BY GOOGLE

SearchSolidStateStorage

SearchVirtualStorage

SearchCloudStorage

SearchDisasterRecovery

SearchDataBackup

Close