Discussing dictionary-based compression algorithms
This is a subject that computer science students research and explain to get their PHDs. For an e-mail question, the best I can do is give you a very light overview.
Dictionary-based compression algorithms usually create a dictionary (a pattern of characters) in memory as data is scanned looking for repeated information (some implementations use a static dictionary so it does have to be built dynamically). Based on the pattern recognition (a look-up in the dictionary), that string of information is replaced by a much shorter but uniquely identifiable string. This results in a compression of that overall data. The size of the dictionary and the speed at which the scan is done is an implementation decision from the different vendors. It's a trade off between cost and latency. There are many techniques for doing this. The most popular compression algorithm is the Limpel-Ziv of which there are several versions. Run-Length-Encoding is a form of this with looking for repeated characters. Huffman encoding used a mathematical probability of character occurrence for representation by smaller bit strings.
This is a whole computer science discipline with many very good textbooks. I suggest buying a couple of those and reading further.
Randy Kerns
Evaluator Group, Inc.
Editor's note: Do you agree with this expert's response? If you have more to share, post it in our Storage Networking discussion forum.