An archive is a collection of data moved to a repository for backup, to keep separate for compliance reasons or for moving off primary storage media. It can include a simple list of files or files organized under a directory or catalog structure (depending on how a particular program supports archiving).
Web and File Transfer Protocol sites that provide downloadable software programs sometimes refer to the list of downloadable files as an archive or archives.
Backup vs. archive
While data backup and archiving are similar, they have distinct differences. Backups are copies of data stored for the purpose of recovery in the case of corruption. These copies are typically created using replication or mirroring and are updated as files change. It is short-term storage that needs to perform well enough to restore data quickly. Backups are usually stored as blocks to facilitate the recovery of large amounts of data at one time.
Archived data is not a copy, but rather inactive and rarely altered data that needs to be retained for long periods of time. Performance is less critical in archive storage. Rather than stored in blocks, archived data is usually stored as a file or object that can be stored with metadata attached so that granular access to data is possible.
Archive storage options
Archive storage typically needs to be able to store large amounts of data, for long periods of time at a low cost. The following storage options are commonly used for archived data:
Tape: Tape is an effective data storage archival format because of its low cost. However, the time it takes to access data stored on tape is slower than that of other storage options. For that reason, it is most often used as a long-term archival location, where data is unlikely to be accessed.
George Crump discusses why there are better options than disk drives when it comes to long-term archive storage.
Cloud: The cloud is a favorable archival option because it can easily scale and removes the cost of hardware, power and cooling. However, for large data centers with continuously growing archives, the ongoing cost of cloud storage may start to add up. Some major cloud providers offer cloud archiving platforms that offer slower performance for a lower cost.
Object: Object storage is an effective archival storage option because it has the ability to store large amounts of metadata, which is essential to easily access data. Object storage is also low-cost and can store large amounts of data.
Enterprise data archiving tools
Archiving software allows data to move from production storage to archive storage as needed. Many archiving software products can automatically offload data to the archived storage location based on user-created policies or as the data becomes less frequently accessed. Some archiving software connects directly to a cloud provider, while other software helps tape or object storage act as an extension of the disk used to store production data.
In many cases, archiving and backup software are integrated. Some software also offers the ability to cache segments of archived data on disk, while the majority is stored on object or tape to improve response times when data is accessed.