Problem solve Get help with specific problems with your technologies, process and projects.

Discussing dictionary-based compression algorithms

What are dictionary-based compression algorithms and how do they function? Please give the details of their functioning.

This is a subject that computer science students research and explain to get their PHDs. For an e-mail question, the best I can do is give you a very light overview.

Dictionary-based compression algorithms usually create a dictionary (a pattern of characters) in memory as data is scanned looking for repeated information (some implementations use a static dictionary so it does have to be built dynamically). Based on the pattern recognition (a look-up in the dictionary), that string of information is replaced by a much shorter but uniquely identifiable string. This results in a compression of that overall data. The size of the dictionary and the speed at which the scan is done is an implementation decision from the different vendors. It's a trade off between cost and latency. There are many techniques for doing this. The most popular compression algorithm is the Limpel-Ziv of which there are several versions. Run-Length-Encoding is a form of this with looking for repeated characters. Huffman encoding used a mathematical probability of character occurrence for representation by smaller bit strings.

This is a whole computer science discipline with many very good textbooks. I suggest buying a couple of those and reading further.

Randy Kerns
Evaluator Group, Inc.

Editor's note: Do you agree with this expert's response? If you have more to share, post it in our Storage Networking discussion forum.

Dig Deeper on Storage management tools

Start the conversation

Send me notifications when other members comment.

Please create a username to comment.