Media life and reliability. Archival storage demands may exceed the reliable life of storage media, such as disk, tape or optical disc. Prospective users must recognize the value of their archival data and understand the associated regulatory or legal issues. For organizations that need to store data for just a few years, a disk-based platform is usually sufficient. For long-term storage of data that is rarely ever accessed, removable optical media may be preferable. Disk storage users must also consider protective measures, such as remote replication or RAID groups with hot spares and accelerated disk rebuild features.
How important is immutability? Although it is possible to archive data on virtually any platform, CAS prevents changes to the data, thus guaranteeing its authenticity. Not every organization needs this feature, but those that must address compliance or litigation may find it invaluable. Users may also opt for traditional WORM media, such as CD-R, DVD-R or emerging media like holographic disc. Disk-based CAS platforms can also offer virtual immutability through software-based storage management tools.
Evaluate storage capacity and data reduction technologies. Select an archive platform that can scale to meet future storage needs. This is usually accomplished using additional disks or clustered storage systems. Archives also depend on data reduction technologies to pare down the sheer amount of redundant data. Data deduplication is also currently standard on many archive platforms. Deduplication works best when used on long-term archives that allow plenty of time to eradicate duplicate data, but it's best to test this type of feature in the lab prior to a purchase. [See Data deduplication explained for more details on this technology.]
Consider practical object scalability. An archive will typically index each file or other key elements ("objects"), and future search requests are made against the index that develops over time. However, search performance can deteriorate as the archive grows to hundreds of millions or even billions of objects. Prospective users should evaluate archive performance with a full system, determine if there are any practical limits to the number of files or objects and evaluate any means of overcoming those limits. As an example, the Content Archive Platform from HDS can support a total archive size up to 20 PB consisting of up to 32 billion objects.
Retention and deletion features. Although an archive stores data for the long term, data is not retained indefinitely. Organizations face the challenge of retaining necessary data and ensuring that the data is not deleted during its retention period. Conversely, the data must be securely deleted after the retention period expires, but any data involved in litigation must be exempted from deletion (litigation hold). The archive system must be able to implement retention policies, provide litigation hold for pertinent data and demonstrate secure deletion with a minimum of administrative oversight.
Archival data protection. Consider how your archive data will be guarded against disaster. Some archive platforms include backup-to-tape features for off-site storage, or provide other data protection capabilities, like data migration or data replication between storage systems. For example, IBM's DR550 supports local and long-distance mirroring, allowing synchronous or asynchronous replication between two DR550s. Chances are your data retention needs will far outlast the physical archive platform, so consider the logistics of moving current data to the new platform, then moving data off that new platform to some future storage system. Ideally, data migration should cause minimum disruption while preserving the retention policies, holds, indexes and other data attributes.
Data retrieval and security options. Before purchasing an archive, decide how "accessible" the archived data should be and verify that any management tools can accommodate that level of accessibility while maintaining the necessary security. If you determine that users on the network should be able to access or restore files as needed, minimizing storage administrator involvement, make sure that the management tool supports those tasks, such as searching, and implements access restrictions that will prevent casual or unauthorized access. Users should not be able to invoke or rescind litigation holds and should not be able to alter or destroy any data currently under litigation hold.
Total cost of ownership. Archive systems carry incremental costs, such as future storage upgrades and annual system maintenance agreements. Power is another expense, especially for large archive systems full of spinning disks. Shop by total cost of ownership (TCO), not the lowest price. Features like wide-scale systematic disk spindown (a.k.a. MAID) and other disk-powered management techniques can dramatically reduce power requirements and ease overstretched power budgets.
Protocol support and connectivity. Choose an archive platform that can integrate properly in your current environment. For example, the StorageTek 5800 storage system from Sun Microsystems Inc. includes two 1 Gigabit Ethernet ports in each system. By comparison, the FAS3020 from NetApp includes 20 Fibre Channel ports and 24 regular Ethernet ports. The Content Archive Platform from HDS supports a variety of common network protocols, including NFS, CIFS/SMB, HTTP, WebDAV, SMTO and NDMP.
The archive hardware specifications page in this chapter covers the following products: