Home > Data retrieval strategies: Retrieving data from archives overview
All-in-One Guide:
EMAIL THIS LICENSING & REPRINTS

Data retrieval strategies: Retrieving data from archives overview

09 May 2007 | SearchStorage.com

Digg This!    StumbleUpon Toolbar StumbleUpon    Bookmark with Delicious Del.icio.us   

Archives retain business data that is only accessed infrequently, yet must be kept for a prolonged period of time. For example, the medical records of adult patients generally need to be kept for at least seven years, but doctor's notes, X-rays and MRI images may only be referenced during office visits several times each year. Data retrieval poses special challenges for archival storage systems, which must provide the capacity and reliability to retain data for many years, protect that data against unauthorized changes and quickly locate fragments of data from the archive on demand. This overview highlights the key issues involved with retrieving data from archives.

Immutability

Archives are often implemented to meet local or national compliance regulations governing the availability and retention of data, most often in financial and healthcare industries. Some of the most well-known compliance regulations include the Sarbanes-Oxley Act (SOX) and the Health Insurance Portability and Accountability Act of 1996 (HIPAA), but there are about 15,000 other regulations that businesses need to be aware of, as well. See the article Storage Compliance Explained for more details.

Immutability is often a main attribute of archival storage systems. That is, once data is committed to an archive, it cannot be changed or deleted until its retention period has expired. This is often referred to as a WORM archive or content-addressed storage (CAS). Files are typically assigned a unique identifier that is stored along with the data when it's written to the archive. In many cases, any data retrieved to support litigation must be from an immutable archive -- otherwise there is no way to determine authenticity of the data. Some archives can port to tape or virtual tape libraries (VTL) so that archives can be backed up periodically.

Extending storage capacity

Archival storage capacity is always a concern since data is, as mentioned above, generally immutable and cannot be deleted until the retention period expires. This requires careful capacity management to ensure that the archive does not run out of space. One of the major technologies used to extend capacity is data deduplication, also called intelligent compression or single-instance storage.

Data deduplication works by eliminating redundant data from the archive -- saving only one unique iteration of the file, block or byte to the archive and replacing subsequent iterations with a small pointer to the saved copy. In normal operation, a deduplicated archive can achieve effective reductions from 10 to 1 up to 50 to 1. Today, most archives employ block- or byte-level data deduplication to reduce storage demands.

Index and search

An archive can eventually contain hundreds of gigabytes or more spread out across hundreds of millions of unique files. Retrieving important data months or years later would be problematic at best, so powerful indexing and searching capabilities are an essential element of many archive platforms.

Indexing basically generates metadata details about each file and possibly the contents of the file, and then organizes those details into a database or repository of some sort with indices that can be efficiently searched at a later date. Metadata may include details like a filename, description, creator, creation date, key search words, and many other items that are often customized to meet the unique needs of each company. The index may be stored on the archive along with the data.

Search tools are actually used to locate the data for retrieval. Depending on the actual search tool, searches can utilize the metadata indexes or even "look inside" some files, such as documents or .PDF files, to perform deeper contextual searches of file content. For example, a healthcare provider might search for records based on patient name, provider ID and dates of service. Similarly, broader searches might be performed for all patients sharing the same illness/diagnosis or prescribed drugs. In many cases, search results are displayed by relevance in a Web browser-based display similar to Google.

Do not underestimate the business importance of indexing and searching. Retrieving needed files is crucial for compliance audits, e-discovery and litigation support activities. When a demand for discovery is made, a company typically has only weeks to locate and provide the required data. Failure to tender data in a timely fashion can have terrible financial consequences for a business.

Security and retention

Data retrieval from an archive should also be restricted to authorized personnel -- especially if the archive is not immutable. Credentials should be required to authenticate each user, and a detailed activity log should capture file access and track other user activities within the archive. Solid security precautions will reduce the chance that files are altered or deleted unexpectedly.

Archives should also be implemented with well-defined data retention and deletion policies in place. Archived data must often be available for retrieval over years -- even decades -- so retention is important to meet compliance and legal obligations. Retention periods can vary by file type and may be set in metadata during the file archiving process and generally cannot be changed until deletion.

Deletion is often an overlooked aspect of retention. Experts suggest that there is greater legal exposure in retaining unnecessary data (past its retention period) rather than deleting it, so data should be securely destroyed as soon as its retention period expires. This also frees up valuable space on the archive. The archive platform itself will generally provide the software needed to secure the system, and set retention and deletion policies.



Digg This!    StumbleUpon Toolbar StumbleUpon    Bookmark with Delicious Del.icio.us   


RELATED CONTENT
Data storage compliance and archiving
Kazeon boosts data classification with e-discovery features
Clearwell makes its e-discovery search more transparent
How many hosted email archiving services are there? How do they distinguish themselves?
Do enterprises use hosted email archiving services, or just SMBs?
Are there any ramifications in terms of e-discovery and compliance when the archiving of your email is done offsite?
Will folders be replaced by more powerful search functionality in terms of finding archived information?
Will SaaS storage services decrease the amount of storage equipment and storage capacity purchased?
Keep it or can it?
Tape leads the way for compliance storage
The big pipe: Editorial
Data storage compliance and archiving Research

RELATED GLOSSARY TERMS
Terms from Whatis.com − the technology online dictionary
litigation hold  (SearchStorage.com)

RELATED RESOURCES
2020software.com, trial software downloads for accounting software, ERP software, CRM software and business software systems
Search Bitpipe.com for the latest white papers and business webcasts
Whatis.com, the online computer dictionary




Find Data Reduction and Deduplication White Papers
TechTarget Storage Media
Storage Magazine View this month\\'s issue and subscribe today.
Storage Decisions Apply online for free conference admission.
SearchStorage.com
HomeNewsMagazineTopicsLearningMultimediaWhite PapersBlogsEventsAbout Us

About Us  |  Contact Us  |  For Advertisers  |  For Business Partners  |  Site Index  |  RSS
TechTarget provides enterprise IT professionals with the information they need to perform their jobs - from developing strategy, to making cost-effective IT purchase decisions and managing their organizations' IT projects - with its network of technology-specific Web sites, events and magazines.

TechTarget Corporate Web Site  |  Media Kits  |  Reprints  |  Site Map




All Rights Reserved, Copyright 2000 - 2008, TechTarget | Read our Privacy Policy
  TechTarget - The IT Media ROI Experts