data deduplication
Home > Storage Technology Definitions - Data deduplication
SearchStorage.com Definitions (Powered by WhatIs.com)
EMAIL THIS
LOOK UP TECH TERMS Powered by: WhatIs.com
Search listings for thousands of IT terms:
Browse tech terms alphabetically:
A B C D E F G H I J K L M N O P Q R S T U V W X Y Z #

data deduplication


Show me everything on Data reduction and deduplication


Word of the Day


DEFINITION -

Data deduplication (often called "intelligent compression" or "single-instance storage") is a method of reducing storage needs by eliminating redundant data. Only one unique instance of the data is actually retained on storage media, such as disk or tape. Redundant data is replaced with a pointer to the unique data copy. For example, a typical email system might contain 100 instances of the same one megabyte (MB) file attachment. If the email platform is backed up or archived, all 100 instances are saved, requiring 100 MB storage space. With data deduplication, only one instance of the attachment is actually stored; each subsequent instance is just referenced back to the one saved copy. In this example, a 100 MB storage demand could be reduced to only one MB.

Data deduplication offers other benefits. Lower storage space requirements will save money on disk expenditures. The more efficient use of disk space also allows for longer disk retention periods, which provides better recovery time objectives (RTO) for a longer time and reduces the need for tape backups. Data deduplication also reduces the data that must be sent across a WAN for remote backups, replication, and disaster recovery.

Data deduplication can generally operate at the file, block, and even the bit level. File deduplication eliminates duplicate files (as in the example above), but this is not a very efficient means of deduplication. Block and bit deduplication looks within a file and saves unique iterations of each block or bit. Each chunk of data is processed using a hash algorithm such as MD5 or SHA-1. This process generates a unique number for each piece which is then stored in an index. If a file is updated, only the changed data is saved. That is, if only a few bytes of a document or presentation are changed, only the changed blocks or bytes are saved, the changes don't constitute an entirely new file. This behavior makes block and bit deduplication far more efficient. However, block and bit deduplication take more processing power and uses a much larger index to track the individual pieces.

Hash collisions are a potential problem with deduplication. When a piece of data receives a hash number, that number is then compared with the index of other existing hash numbers. If that hash number is already in the index, the piece of data is considered a duplicate and does not need to be stored again. Otherwise the new hash number is added to the index and the new data is stored. In rare cases, the hash algorithm may produce the same hash number for two different chunks of data. When a hash collision occurs, the system won't store the new data because it sees that its hash number already exists in the index.. This is called a false positive, and can result in data loss. Some vendors combine hash algorithms to reduce the possibility of a hash collision. Some vendors are also examining metadata to identify data and prevent collisions.

In actual practice, data deduplication is often used in conjunction with other forms of data reduction such as conventional compression and delta differencing. Taken together, these three techniques can be very effective at optimizing the use of storage space.

Learn more about Data reduction and deduplication
Data deduplication: Data deduplication best practices include selecting the right deduping product for your storage environment and deciding where to dedupe the data.
Data deduplication backup appliance market matures: As data deduplication becomes more common in data backup implementations, a slew of vendors have joined Data Domain (recently acquired by EMC), the dedupe market's heavyweight.
Data deduplication technology primer: Top 10 dedupe and backup tips: Check out our top 10 tips on data deduplication technology in backup and recovery today.
Data deduplication tools move into data backup infrastructure, but tape media hangs on: Despite new data reduction technologies like data deduplication, tape backup users say there's still a place for tape in backup and recovery.
Dedupe dos and don'ts: Data deduplication technology best practices: Learn best practices for implementing data deduplication into your backup system from backup expert W. Curtis Preston.
Using data deduplication with backup applications: Source vs. target dedupe: In W. Curtis Preston's latest column, read about the latest data deduplication battle with backup apps and about source vs. target dedupe.
Storwize claims good data compression rates, no performance degradation on STN-6000 appliance: Storwize's STN-6000 inline appliance acts on primary storage using real-time compression.
NetApp: Post-process deduplication limits performance hit in primary storage data deduplication: NetApp's approach to primary storage data deduplication limits the performance penalty to about 20% percent.
Storage Decisions Session Downloads: Backup Technologies Track (Chicago 2009): This Storage Decisions Chicago 2009 track explores topics including data deduplication, virtual tape libraries and managing your backup budget.
Data backup and recovery technology tutorials: Learn about the latest data backup and recovery technology trends in our technology tutorials.
Storage Decisions Session Downloads: Backup Technologies Track (San Francisco 2008): This Storage Decisions San Francisco 2008 track explores topics including data deduplication, virtual tape libraries and the newly integrated backup suites.

CONTRIBUTORS: Stephen J. Bigelow
LAST UPDATED: 21 Dec 2009

Do you have something to add to this definition? Let us know.
Send your comments to techterms@whatis.com

More resources from around the web:
- Storage Magazine provides 'The skinny on data deduplication.'
- In this SearchStorage.com tip, Stephen Bigelow differentiates between compression, data deduplication and encryption.
- Data deduplication approaches in backup today
- Data deduplication tutorial
- Understanding data deduplication ratios in backup systems
- Dedupe dos and don'ts: Data deduplication technology best practices
- Data deduplication technology primer: Top 10 dedupe and backup tips
- Global data deduplication can simplify administration of multiple deduplication devices
- Video: Deduplication and disk backup:





FILE EXTENSION AND FILE FORMAT LIST
File Extension and File Format List:
A B C D E F G H I J K L M N O P Q R S T U V W X Y Z #


RELATED CONTENT
Backup and disaster recovery (DR) hardware finalists: 2009 Products of the Year
Find out the six finalists in the backup and disaster recovery (DR) hardware category in the 2009 Storage magazine and SearchStorage.com Products of...
Creating a data center migration plan
Your five-step data center migration plan should include an IT assessment, relocation analysis and design, a data migration plan, risk identification...
An introduction to data compression
W. Curtis Preston talks compression basics: what data should be compressed, the differences between deduplication and compression, and how to define a...

RELATED GLOSSARY TERMS
Terms from Whatis.com − the technology online dictionary
compression  (SearchStorage.com)
delta differencing  (SearchStorage.com)
Delta differencing (also called "delta differential") is a backup technique used to make the process more efficient.... (Continued)




Search data deduplication at SearchStorage
TechTarget Storage Media
Storage Magazine View this month\\'s issue and subscribe today.
Storage Decisions Apply online for free conference admission.
SearchStorage.com
HomeNewsMagazineTopicsLearningMultimediaWhite PapersBlogsEventsAbout Us

About Us  |  Contact Us  |  For Advertisers  |  For Business Partners  |  Site Index  |  RSS
TechTarget provides technology professionals with the information they need to perform their jobs - from developing strategy, to making cost-effective purchase decisions and managing their organizations' technology projects - with its network of technology-specific websites, events and online magazines.

TechTarget Corporate Web Site  |  Media Kits  |  Site Map




All Rights Reserved, Copyright 2000 - 2010, TechTarget | Read our Privacy Policy
  TechTarget - The IT Media ROI Experts