Permabit CEO Tom Cook claims the vendor has several OEM design wins with major vendors for the sub-file deduplication technology, which is delivered to storage vendors as a software development kit (SDK). Albireo and similar data reduction software from other vendors could speed adoption of a technology that has already attracted a great deal of attention for backup data.
Industry analysts say interest is high for primary data deduplication, and most leading storage vendors are working to follow NetApp Inc.'s lead with its deduplication feature. NetApp deduplication is considered a big selling point for its FAS arrays in virtual server environments, and a big reason for its recent gains in storage market share.
EMC Corp. offers single-instance storage and compression with its Celerra unified storage platform, and has dedupe backup technology from its Data Domain and Avamar acquisitions. Dell Inc., Hewlett-Packard (HP) Co., Hitachi Data Systems, IBM and smaller vendors are believed to be evaluating data deduplication software from Permabit and others that they can embed for primary storage.
Permabit's Cook expects Albireo to be in shipping OEM products by the end of this year. While appliances are necessary to dedupe large images, Cook said storage vendors say mainstream needs can be met better with dedupe embedded into the storage array.
"OEMs told us 'We have a file system, we have a block storage system, we want to integrate deduplication with that, so foremost you must be able to do that and in an embedded way,'" he said.
Albireo — named after a double star that appears like a single star to the naked eye — supports block, file and unified storage infrastructure, according to Permabit chief technology officer (CTO) Jered Floyd. It uses data deduplication IP from Permabit's Enterprise Archive inline deduplication for secondary storage, but expands it to run inline, post-process or in a parallel configuration.
VMware a big driver for primary data deduplication
Permabit executives claim Albireo High Performance Data Optimization Software can scale to petabytes of data with variable-chunk segmentation that could reduce "typical" data sets more than 50%, and some data sets such as VMware images by more than 90%.
Brian Babineau, a senior consulting analyst at Enterprise Strategy Group, said the ability to reduce VMware images is a major selling point for primary data deduplication.
"The VMware influence in primary dedupe should not be underestimated," he said. "VMware creates a lot of duplicate content in its configuration files, and those redundant configuration files increase as VMware in the environment increases.
"There's end-user demand for these types of solutions," he continued, "and big OEMs don't want to miss out on this opportunity."
Primary storage data dedupe must meet performance, integrity needs
Still, any primary storage data dedupe product will have to prove it can handle the chore while preserving data integrity and without introducing read and write latency.
Permabit's Floyd claims Albireo can maintain data integrity because data written to disk isn't altered, and the reduction takes place out of the data path. When parallel processing is used, deduped data doesn't have to be rehydrated when it's accessed.
According to Floyd, Albireo uses only 3.5 bytes in memory per index entry, and as few as 40 bytes on disk per index entry. This allows it to operate effectively in small memory-constrained storage systems and appliances.
When done inline, data will flow to the Albireo library before going to disk. Post-process deduplication will write data to disk first, then scan and eliminate duplicated data. The parallel option sends data to disk while still in memory, and applies updates the same way as post-processing without having to read data off disk. Each method has different amounts of latency and reduction efficiencies.
Dave Russell, a research vice president at Gartner, said the biggest customer concerns about dedupe in primary storage are data integrity, performance and cost.
"Performance is a significant obstacle, but data integrity is an even more foundational concern before getting to the issue of performance," he said. "For some companies, manageability and scale can be even more top of mind than performance. It really depends on the workload and service level. "
However, Russell said he expects primary storage data dedupe will catch on fast because organizations are already familiar with the technology from backup products, as well as the data reduction offered by NetApp and EMC on primary data.
"It's been amazing to witness how fast deduplication went from a 'science experiment' to mainstream in the backup use case, and primary storage -- while perhaps not being adopted at the same rate -- is being considered more and more," he said. "Performance will get better and become less and less of an issue as most of the algorithms are limited by CPU, which is getting very inexpensive. But even in cases where memory and disk spindles play a role in performance, those issues are increasingly getting more cost-effective to overcome."