This is a subject that computer science students research and explain to get their PHDs. For an e-mail question,...
the best I can do is give you a very light overview.
Dictionary-based compression algorithms usually create a dictionary (a pattern of characters) in memory as data is scanned looking for repeated information (some implementations use a static dictionary so it does have to be built dynamically). Based on the pattern recognition (a look-up in the dictionary), that string of information is replaced by a much shorter but uniquely identifiable string. This results in a compression of that overall data. The size of the dictionary and the speed at which the scan is done is an implementation decision from the different vendors. It's a trade off between cost and latency. There are many techniques for doing this. The most popular compression algorithm is the Limpel-Ziv of which there are several versions. Run-Length-Encoding is a form of this with looking for repeated characters. Huffman encoding used a mathematical probability of character occurrence for representation by smaller bit strings.
This is a whole computer science discipline with many very good textbooks. I suggest buying a couple of those and reading further.
Evaluator Group, Inc.
Editor's note: Do you agree with this expert's response? If you have more to share, post it in our Storage Networking discussion forum.
Related Q&A from Randy Kerns
What is the one hidden gotcha that you'd advise users about if they were shopping for an all-flash storage array?continue reading
How much control do you have with all-flash storage arrays? How much control do you have over how arrays handle your data? Do you control the caching?continue reading
Vendors often publish numbers for 'usable' capacity versus 'effective' capacity. Can you explain this and how can you plan flash capacity needs with ...continue reading
Have a question for an expert?
Please add a title for your question
Get answers from a TechTarget expert on whatever's puzzling you.