data deduplication
Home > Storage Technology Definitions - Data deduplication
SearchStorage.com Definitions (Powered by WhatIs.com)
EMAIL THIS
LOOK UP TECH TERMS Powered by: WhatIs.com
Search listings for thousands of IT terms:
Browse tech terms alphabetically:
A B C D E F G H I J K L M N O P Q R S T U V W X Y Z #

data deduplication


Show me everything on Data reduction and deduplication


Word of the Day


DEFINITION -

Data deduplication (often called "intelligent compression" or "single-instance storage") is a method of reducing storage needs by eliminating redundant data. Only one unique instance of the data is actually retained on storage media, such as disk or tape. Redundant data is replaced with a pointer to the unique data copy. For example, a typical email system might contain 100 instances of the same one megabyte (MB) file attachment. If the email platform is backed up or archived, all 100 instances are saved, requiring 100 MB storage space. With data deduplication, only one instance of the attachment is actually stored; each subsequent instance is just referenced back to the one saved copy. In this example, a 100 MB storage demand could be reduced to only one MB.

Data deduplication offers other benefits. Lower storage space requirements will save money on disk expenditures. The more efficient use of disk space also allows for longer disk retention periods, which provides better recovery time objectives (RTO) for a longer time and reduces the need for tape backups. Data deduplication also reduces the data that must be sent across a WAN for remote backups, replication, and disaster recovery.

Data deduplication can generally operate at the file, block, and even the bit level. File deduplication eliminates duplicate files (as in the example above), but this is not a very efficient means of deduplication. Block and bit deduplication looks within a file and saves unique iterations of each block or bit. Each chunk of data is processed using a hash algorithm such as MD5 or SHA-1. This process generates a unique number for each piece which is then stored in an index. If a file is updated, only the changed data is saved. That is, if only a few bytes of a document or presentation are changed, only the changed blocks or bytes are saved, the changes don't constitute an entirely new file. This behavior makes block and bit deduplication far more efficient. However, block and bit deduplication take more processing power and uses a much larger index to track the individual pieces.

Hash collisions are a potential problem with deduplication. When a piece of data receives a hash number, that number is then compared with the index of other existing hash numbers. If that hash number is already in the index, the piece of data is considered a duplicate and does not need to be stored again. Otherwise the new hash number is added to the index and the new data is stored. In rare cases, the hash algorithm may produce the same hash number for two different chunks of data. When a hash collision occurs, the system won't store the new data because it sees that its hash number already exists in the index.. This is called a false positive, and can result in data loss. Some vendors combine hash algorithms to reduce the possibility of a hash collision. Some vendors are also examining metadata to identify data and prevent collisions.

In actual practice, data deduplication is often used in conjunction with other forms of data reduction such as conventional compression and delta differencing. Taken together, these three techniques can be very effective at optimizing the use of storage space.

Learn more about Data reduction and deduplication
Storage Decisions Session Downloads: Backup Technologies Track (Chicago 2009): This Storage Decisions Chicago 2009 track explores topics including data deduplication, virtual tape libraries and managing your backup budget.
Data backup and recovery technology tutorials: Learn about the latest data backup and recovery technology trends in our technology tutorials.
Storage Decisions Session Downloads: Backup Technologies Track (San Francisco 2008): This Storage Decisions San Francisco 2008 track explores topics including data deduplication, virtual tape libraries and the newly integrated backup suites.
Product roundup: Primary storage capacity optimization offerings: Data reduction isn't restricted to backup anymore. Right now, the approaches to capacity optimization for primary and nearline storage are varied and address different use cases.
Users turn data reduction focus to primary storage: Users are intrigued by the prospect of running data reduction solutions on primary storage, because of the money they can save on buying Tier 1 disk capacity.

CONTRIBUTORS: Stephen J. Bigelow
LAST UPDATED: 18 Nov 2009

Do you have something to add to this definition? Let us know.
Send your comments to techterms@whatis.com

More resources from around the web:
- Storage Magazine provides 'The skinny on data deduplication.'
- In this SearchStorage.com tip, Stephen Bigelow differentiates between compression, data deduplication and encryption.





FILE EXTENSION AND FILE FORMAT LIST
File Extension and File Format List:
A B C D E F G H I J K L M N O P Q R S T U V W X Y Z #


RELATED CONTENT
Tools and techniques for reducing your enterprise data storage footprint
Find out how storage managers can reduce their enterprise data management footprint with tools such as data archiving, thin provisioning and data...
Is data deduplication right for your primary storage infrastructure?
What are the requirements of deduplicating primary storage? Find out what to do before undertaking a primary deduplication project and the vendor...
Backup in a snap: A guide to snapshot technologies
Snapshots are used to enhance backup systems and shorten RTOs and RPOs. But you need to know how snapshots can vary, and what those differences could...

RELATED GLOSSARY TERMS
Terms from Whatis.com − the technology online dictionary
delta differencing  (SearchStorage.com)
Delta differencing (also called "delta differential") is a backup technique used to make the process more efficient.... (Continued)




Search data deduplication at SearchStorage
TechTarget Storage Media
Storage Magazine View this month\\'s issue and subscribe today.
Storage Decisions Apply online for free conference admission.
SearchStorage.com
HomeNewsMagazineTopicsLearningMultimediaWhite PapersBlogsEventsAbout Us

About Us  |  Contact Us  |  For Advertisers  |  For Business Partners  |  Site Index  |  RSS
TechTarget provides technology professionals with the information they need to perform their jobs - from developing strategy, to making cost-effective purchase decisions and managing their organizations' technology projects - with its network of technology-specific websites, events and online magazines.

TechTarget Corporate Web Site  |  Media Kits  |  Site Map




All Rights Reserved, Copyright 2000 - 2009, TechTarget | Read our Privacy Policy
  TechTarget - The IT Media ROI Experts