Essential Guide

Building a better archival storage strategy

Learn how different storage archive implementations and strategies can lead to a more efficient environment.

Introduction

Data archiving has never been a sexy concept. In general, it's considered the concern of museums and academic and special libraries, organizations that are typically funded differently than for-profit businesses that perceive value in old stuff.

That's a long way of saying that hardly anyone enters the study of computer technology or computer science with the goal of becoming a data archivist. In just about every aspect of computing, the emphasis is on the here and now -- doing things faster, more efficiently and with greater agility than in the past. Data that's no longer referenced with any frequency tends to fall off the radar. The only concern is that it not be deleted, because that could have devastating consequences. But data archiving has many benefits to offer besides regulatory compliance and historical preservation.

Based on a study of more than 3,000 corporate storage infrastructures, as much as 40% of the capacity of every disk drive spinning in a company is occupied by data that hasn't been referenced in the last month, six months or one year. Yet, 7 W to 21 W of electrical power is supplied to each drive every second to keep them spinning; in addition, when drives fail, they're replaced and rewritten with the same data from backups or as part of a RAID set rebuild. That means we're wasting both electrical power (to energize drives and bleed off the heat they generate) and staff time (to confirm data, drive integrity and perform periodic maintenance), while building more capacity into our infrastructure to store new data year after year.

Then there's the issue of productivity. It may seem inconsequential that a search for a string of words takes a few milliseconds longer as you clot up your file systems with more data, but multiply that by the number of searches conducted every day by all the employees, customers, and others in and outside of your firm who have permission to scan your data. You're talking about a lot of wasted time searching through the 40% of your data that's included in search results only because it's physically recorded in and around active data.

The bottom line is that the business-value case for data archiving -- based on cost containment (archive data to reduce the urgency to buy more storage capacity), risk reduction (archive data to ensure regulatory compliance) and improved productivity (archive data to get it out of the way of searches, report generation, backups and so on) -- is pretty persuasive, whether you believe the data has historical merit or not. As simple as this business case may be to understand and appreciate, archiving itself remains a mystery to many IT folk. There are myriad issues of methodology and technology to parse when you develop a vision for your archive system and a strategy for bringing it to bear on your current infrastructure.

Understanding deep archives versus active archives is one of the first hurdles in reaching an understanding of data archiving. There are many definitions and uses of the term, often skewed to support one vendor or another's products.

Here's my definition: An archive is a collection of data -- a fact everyone cites but that offers no value whatsoever to the illumination of the strategy. A backup is also a collection of data, but backups aren't archives, at least not in the sense of long-term data preservation. Backups offer short-term protection of data assets against the corruption or deletion of the data itself or the breakage of the primary storage infrastructure. Backups are cycled and updated frequently to account for the latest data assets, and ideally, archival-quality data is periodically excluded from production backups because its restoration isn't generally needed in an emergency and to make the backup process work faster.

Archives are stores of files or data sets that are rarely re-referenced. These data assets are usually retained in a separate storage system with its own processes for data management and protection, able to deliver speeds and feeds appropriate for efficient data ingestion and rare data access, once written. Capacity is prized over performance, but performance must still be provided that's adequate to the profile of reads and writes common in an archive platform.

Contrary to popular belief, archive systems aren't junk technology kept in service so they can perform the lowly task of storing old data no one cares about. They're part of an ecosystem of storage systems no more or less critical than your nimblest solid-state drive or hard disk drive hybrid array placed behind your most mission-critical transaction processing system. Without the archive system, the transaction system can't function cost effectively or at peak efficiency.

Still, some draw a distinction between deep archive and active archive. The former is what we have traditionally considered an archive to be: a collection of data with historical business value or specific regulatory or legal retention requirements that's seldom if ever accessed. Deep archive systems, because of their limited re-reference rates, are designed with specific attention to the container that will be used to hold data for later use and to the technology to which that container is written -- specifically, to the longevity of the media, including its interoperability with future data access technologies. It does no good to have a 20-year-old archive of kept technology that can no longer interface with contemporary servers or operating systems. That's a big concern of deep archiving.

Active archive, by contrast, refers to storing data that, once written, changes very infrequently. However, it's data that may be read a lot but not modified, so it presents a different set of storage requirements than either read-, write- or modify intensive primary storage systems or read-, write-, modify, never deep-archive systems. Think video: Once the television episode is recorded, it won't be modified, but it might be replayed (read) many times. The video file is archival data, but it's still active (read). That's active archive in a nutshell, but it's actually another kind of primary storage.

Finally, it's worth noting that archives don't just happen as data becomes stale. You need to plan them carefully. Hierarchical storage management (HSM) systems, which migrate data from faster storage to slower storage based on metadata information such as the date last accessed and the date last modified, don't create an archive in the strict sense of the term. Still, HSM can help to identify candidate data for inclusion in a deep or active archive, but it's also adept at identifying temporary data that can simply be deleted once its usefulness has been depleted.

1Technologies-

Choosing the right archival storage option

When it comes to building an archive, there are a number of different technology options to choose from. But before deciding on one platform, it's important to consider the features and policies of those technologies and how they suit your needs. Whether you're working with an active archive or a deep archive, cloud or tape, understanding how data is classified and how difficult it will be to retrieve are important factors. The following links will provide examples of different archival storage technologies, as well as expert tips regarding the qualities you should be looking for in them.

Tip

Evaluating features of data archival storage options

Once you've determined the specific data archiving needs of your organization, you'll have to sift through existing products to find the best fit. Continue Reading

Tip

Top data archiving storage technologies

Archival storage options are expanding, and becoming more resilient and less costly. Find out which technologies are making an impression. Continue Reading

Answer

How cloud archiving services differ

Backup expert George Crump discusses the issues involved in picking a cloud archiving service for your organization. Continue Reading

Answer

Email archival vs. general archival software

Brien Posey compares email archiving software with general-purpose archiving software in this Expert Response. Continue Reading

Answer

Using tape for active archive storage

David Hill, analyst with the Mesabi Group, discusses tape's role in active archiving. Continue Reading

2Strategy-

Implementing your archival storage platform

The best way to design an archive platform depends on a number of factors: How much data will you be storing? How long should data be retained? How frequently will the data be accessed? Below is a selection of expert tips and answers that provide insight on how to approach these questions, and on the best strategies to make your archival storage approach successful.

Tip

Tips for a successful archival storage strategy

A good archiving process provides the automation needed to deliver the necessary application detail while minimizing the impact to IT operations. Continue Reading

Tip

Archival storage technology selection: Five questions to ask

Many factors need to be considered when determining the right archiving platform for your organization. Jon Toigo highlights the most important in this tip. Continue Reading

Feature

Archival storage planning: Policies, products and best practices

Primary storage can be expensive and has a finite capacity. These data archiving best practices will help you decide which data to move to archival storage. Continue Reading

Tip

Designing deep archive storage: Top considerations

Jon Toigo explains how to determine which data should be archived and the most efficient way to migrate it for long-term storage. Continue Reading

Answer

Choose the right software to manage your archive

David Hill discusses considerations for choosing archive management software in this Expert Answer. Continue Reading

3Efficiency-

Benefits of an effective data archive

Archives are essential to containing a great deal of the data growth seen today. When implemented effectively, archival storage can boost efficiency in a number of ways: Faster storage hardware is reserved for more critical data, capacity is freed up on production storage, and it can even help lower data center costs. To learn more about how archival storage can provide these benefits, check out the links below.

Feature

Implement an archive to boost overall storage efficiency

A core best practice for effective storage management is using archival storage that frees up storage resources, improves performance and protects data that must be retained. Continue Reading

Tip

Eliminate wasted disk capacity in archival storage

Jon Toigo explains how HSM and other technologies can help make the most of disk capacity. Continue Reading

Tip

Three considerations for successful long-term data archiving

Jon Toigo offers insights into storage options for deep archives, and how they affect data integrity and potential technology issues. Continue Reading

Answer

Why archive software is necessary

Brien Posey discusses backup and archive software, as well as e-discovery in this Expert Answer. Continue Reading

Tip

How active archives can help SMBs

Active archives, once thought of as practical only for large companies, are becoming more useful for small and medium-sized business customers in a variety of vertical industries. Continue Reading

-ADS BY GOOGLE

SearchSolidStateStorage

SearchVirtualStorage

SearchCloudStorage

SearchDisasterRecovery

SearchDataBackup

Close