Data reduction in primary storage (DRIPS!)


This article can also be found in the Premium Editorial Download "Storage magazine: Slimmer storage: How data reduction systems work."

Download it now to read this article plus other related content.

Pros and cons of reduction technologies

Thin provisioning is a good technology for reducing the size of initial primary allocations, as most applications don’t use their full space allocation at creation time and end-user storage capacity is typically overspecified to accommodate future growth. While savings from thin provisioning can be as high as 30%, the ongoing benefits of thin provisioning require maintenance and monitoring to ensure storage “stays thin.” Vendor implementations take radically different approaches to achieve this and, regardless, all thin provisioning deployments require host-based support. In addition, as discussed earlier, thin provisioning tackles overallocation of resources, so it won’t realize any savings where logical storage capacity is fully physically utilized.

Compression is a simple technology to deploy, requiring no user intervention in normal operation, but there are two factors to consider when using the technology. First, the compressed data needs to be “rehydrated” before a user can access it, and compression algorithms introduce latency into the write I/O cycle. Rehydration can introduce latency into data read time, as the data is uncompressed in memory prior to delivering the I/O request. During a write operation, as data is changed, the new compressed data size can increase, making it impossible to re-save the data in its original location. This introduces additional computations, especially when RAID parity calculations

Requires Free Membership to View

are involved. However, as processing power has increased (especially with today’s Intel Xeon processors), the computing overhead of compression is becoming less of a problem. The savings from compression are highly dependent on the type of data in use, but reductions can be significant with pre-formatted data such as databases.

Data deduplication is also a simple technology for users to implement, requiring no additional management overhead. Savings are realized by identifying repeated blocks of identical data, removing the duplicates and placing logical pointers to the single- instance physical copy.

There are two ways in which duplicates are identified: inline or via post processing. Inline dedupe identifies duplicate copies of data as they’re written to the storage array, usually by means of a hash table that creates a unique identifier for each different block of data. The inline technique requires more processing overhead and can introduce additional latency into the I/O operation; however, it’s more space efficient and can result in less back-end I/O when data doesn’t need to be physically written to disk.

Post-processing dedupe scans for duplicate blocks of data asynchronously as a background task that occurs independently of normal I/O operations. This method requires additional storage to accommodate the newly written data before it’s deduplicated, so it isn’t as efficient as inline processing. However, it does have less impact on host latency. Care needs to be taken when the dedupe process runs so that host I/O performance isn’t impacted.

Savings from deduplication vary and can range from 2:1 to 10:1, depending on industry segment and the data itself. For example, virtual desktop infrastructure (VDI) and virtual server deployments see good benefits from deduplication where virtual machines and desktops have been cloned from a single gold master.

This was first published in October 2012

There are Comments. Add yours.

TIP: Want to include a code block in your comment? Use <pre> or <code> tags around the desired text. Ex: <code>insert code</code>

REGISTER or login:

Forgot Password?
By submitting you agree to receive email from TechTarget and its partners. If you reside outside of the United States, you consent to having your personal data transferred to and processed in the United States. Privacy
Sort by: OldestNewest

Forgot Password?

No problem! Submit your e-mail address below. We'll send you an email containing your password.

Your password has been sent to: