Whereas storage area network (SAN) storage emphasizes performance, archival storage relies on low-cost, high-capacity SATA drives and employs a combination of RAID and traditional backups to guard disks against failure. Some archives are little more than "dumb" disk arrays, but the more sophisticated archives provide data deduplication for single-instance storage, robust power conservation features and immutability for data that may be needed as evidence in litigation.
The four chapters in this guide list the buying points and product specifications for archiving tools in the areas of content addressed storage (CAS), email archiving, index and search, and policy manager products. But let's first look at the eight criteria for evaluating products associated with data archiving initiatives.
Which data requires archiving? Not all data belongs in an archive. Before purchasing any archiving product, you should perform data classification, which will tell you what data exists in your organization and which data types should be protected in an archive for regulatory compliance, as well as everyday business needs. Data classification should not be shouldered by IT alone. Human resources, legal, accounting and other key departments should be asked to identify the important applications and file types. Exchange server records, patient records or medical imaging files may be appropriate for an archive, but marketing presentations or user MP3 files are probably not. Another issue is how long to retain each data type. Knowing what you need to keep and how long to keep it will help you determine storage requirements and establish scalability requirements for archive management tools.
Does the archiving product accommodate retention and deletion requirements? You cannot evaluate an archiving product without reviewing its data retention and deletion activities. The archiving tool, as well as the software tools supporting the archive, must be able to operate within the necessary retention period. Data retention periods are often the same as those for similar paper-based records and documents. For example, if paper-based employment records must be kept for seven years, their electronic equivalent is often retained for the same period. Four caveats related to retention:
- Be sure to identify an appropriate means of deleting data.
- Do not keep data past its accepted deletion date (unless it is being held for litigation purposes).
- Ensure that you can confirm deletion in a manner acceptable to your compliance environment.
- Changes to retention periods will impact data that has already been archived.
What is the level of integration and automation? Storage administrators cannot migrate, track and delete every file manually. Any archiving product must provide automated features. Indexing tools should be able to add meaningful metadata to each file automatically, then integrated with search tools that can wade through metadata to locate files requested by users. Policy manager tools should be able to apply migration and retention data across file types while restricting data types to certain tiers. Since this allows the tools that move the data to migrate aging data between storage tiers, as well as guide retention and deletion activity, this requires tight integration with other tools.
What is the level of interoperability and heterogeneity? New archive storage systems must interoperate with tools, such as policy managers and data movers, and new software tools should offer the heterogeneity needed to support the current archive hardware. The automated features of the archiving hardware and software must work together seamlessly. Lab testing is important here.
Longevity of the archive technology, media and tools. Archiving poses problems of long-term standardization and natural media degradation. The media may only retain data reliably for 10 years, and tapes written today will probably not be readable on standard tape drives available 20 years from now. A similar problem exists with optical discs (CDs and DVDs) and all types of hard drives. Organizations face a dilemma: either retain old equipment in order to read old media, or periodically refresh the data (e.g., rewrite optical discs or hard drives) to whatever new media standard is in use. While it's easier to maintain backward compatibility with software, changes to the tools can also render older archive media unreadable. A version of email archiving software released in 2028 may not be able to read Exchange archives produced today.
Backup strategies. Archives are not backups. The files located on a disk-based archive may be the only working copies of that data in the enterprise. While disk-based archives rely on RAID for general data protection, archive platforms are typically included in the backup process. An established archive may be completely backed up to tape every few months, then use delta differencing to protect changes to the archive on a daily or weekly basis. Data reduction techniques, such as data deduplication can reduce the total size of the archive and speed the backup process. Bottom line: Find the most effective means of protecting your archival data.
Tracking and reporting features. It's critical to track any activity that occurs with a file and report that activity to the storage administrator. In some cases, tracking and reporting merely help an administrator follow normal changes to the hundreds of millions of files retained in the archive. In other cases, tracking and reporting are essential elements of storage compliance. This may include tracking data migration between tiers, flagging search and access attempts to learn which users are attempting to find data, alerting IT when archived data is changed, and reporting on deletions to document the appropriate disposition of obsolete data.
Maintenance and TCO. Ultimately, any archiving platform or tool will cost more than the initial purchase price. Hardware platforms carry the additional expense of routine maintenance and potential upgrades. Software tools entail recurring costs such as annual licensing, patching and updates. By estimating total cost of ownership (TCO), storage managers can compare the pricing of archiving products more objectively.
This was first published in March 2008