This content is part of the Essential Guide: Building a better archival storage strategy
Manage Learn to apply best practices and optimize your operations.

How to create a successful data archiving strategy

A good archiving process provides the automation needed to deliver the necessary application granularity while minimizing the impact to IT operations.

What you will learn: The technical tools and processes required for an effective data archiving strategy depend entirely on a company's compliance, data governance and storage management requirements.

There's a story that tells how someone once asked Abraham Lincoln how long a man's legs should be. Our 16th president reportedly replied, "Long enough to reach the ground." Similarly, when it comes to the question of how long data should be archived, the reply might be, "Long enough to be sure that it's available when you need it." This statement captures the two most critical variables of the data archive equation: time and accessibility.

Time, or more accurately the retention period, is the "tip of the spear" when it comes to matching an organization's needs with potential archiving solutions. Data retention requirements can be highly variable, often determined on an application-by-application basis. For example, all organizations must manage financial data, which generally must be retained for seven years. Human resources data may need to be retained for three years, but that regulation can vary by state. Medical data might be retained for the life of the patient plus seven years, nuclear power data for 70 years and so on.

There's a simple answer to the question of what all these time periods have in common: compliance. In most cases, the retention requirement matches the statute of limitations for a party (either governmental or private) to bring legal action against the organization. Failing to produce records demanded by a court order can lead to civil and, in some cases, criminal penalties. On the flip side, retaining records beyond the mandated period makes them subject to legal discovery and needlessly jeopardizes the organization's legal position.

Unfortunately (or perhaps fortunately), most IT people have no legal background. So, step one in developing a data archiving strategy is to inventory the data and assign a retention schedule to it. Corporate counsel may be able to provide the necessary parameters. If the attorneys can't (and you'd be surprised how often they decline to do so), the heads of the individual departments that "own" the data might be able to supply the retention information, as they should be familiar with the regulatory environment of their area. Sometimes, attorneys and department leaders don't want to chisel a time frame in stone. In that case, IT organizations shouldn't guess. In the absence of a specific time frame, the default retention period becomes "forever." While not optimal, it may be the only option for IT managers.

The term archive has been used in a rather fast and loose manner over the past several years. Archiving can refer to moving infrequently accessed data to high-capacity, low-cost disk (including tiered storage), backup to tape and offline/off-site storage. Similar to having a continuum of data protection (i.e., a mix of snapshots, replication and backup), organizations will have a data archiving continuum. This continuum will be necessary to meet the varying time frames mentioned above at a cost-effective price. Satisfying these varying needs will be balanced against complexity, and a good archiving solution will provide the automation needed to deliver the necessary application granularity while minimizing the impact to IT operations.

Data archiving benefits

IT organizations will be motivated to implement archiving as a general-purpose enhancement or for application-specific reasons. In either case, expected benefits of archiving include:

  • Reduced costs. Data archiving is largely, though not exclusively, an effort to lower costs. This is measured as $/gigabyte stored. Many vendors offer a total cost of ownership (TCO) analysis. All models are expected to yield positive results, so the results are only meaningful if you agree with both the data input and the underlying premises of the TCO model.
  • Reduced backup window. Even with backup to disk, data compression and data deduplication, backup windows face constant pressure from data growth rates that often exceed a 50% compound annual growth rate. There's no point in repeatedly backing up unchanged data. Archiving can remove tens of terabytes or more of data from the backup set.
  • Compliance. As mentioned earlier, governmental requirements and legal liability are key reasons to implement a data archiving strategy. Doing so at the lowest possible cost is the trick.
  • Knowledge retention. In an era of big data, organizations are learning the value of analyzing vast amounts of data. Here, the consideration isn't cost, but the desire to gain a competitive edge in the marketplace.
  • Improved performance. By reducing the amount of data to manage, or partitioning unused data from active data, organizations may see substantial improvement in system performance.

Application-specific archiving products are tailored to deliver these benefits to specific environments. Examples include SAP, email and Oracle applications. Application-specific products are designed to know the ins and outs of the application so they can prune or separate data in a manner that optimizes the application without endangering referential integrity. General-purpose archivers aren't usually smart enough to do this. An application-specific tool may be all that's needed when data volumes don't justify a system-wide implementation, the major pain point relates to a specific application or a general-purpose product won't adequately address a given application.

About the author:
Phil Goodwin is a storage consultant and freelance writer.

Next Steps

Why data archive planning is important

Dig Deeper on Data storage compliance and regulations

Join the conversation


Send me notifications when other members comment.

Please create a username to comment.

Its interesting.  I was just talking with a friend about an issue he was having in organizing data for all of his clients.   While I don't think he was spurred by his 'legal' needs, this article presents some interesting possibilities for people to consider when they play data retention policies.
In a lot of cases, it's the legal needs -- particularly when it's going to cost you money for failing to comply -- that gets people to look at this issue.
This is a very interesting read. One thing I would like to mention is that I have worked for various companies which have needed to archive data and one of the considerations (in the UK and most likely in other countries too) is how long you can legally hold onto someones data. 
The Data Protection Act in the UK states that data should, ‘not be kept longer than necessary for the purpose for which it was processed’. This means if you are holding customer or staff data or any personal data, a retention period of 'forever' could actually get you in trouble. Especially if that data gets hacked as the company will then be responsible for explaining why they have archived Customer/Personal data that they have no use for or the use is not in line with what it was processed for. Even if this data is encrypted and secured, it is an unnecessary risk to hold personal information that is no longer needed for the business. For this reason I would strongly discourage a default retention period of 'forever' and leave the decision to attorneys and department heads.