This is a subject that computer science students research and explain to get their PHDs. For an e-mail question, the best I can do is give you a very light overview.
Dictionary-based compression algorithms usually create a dictionary (a pattern of characters) in memory as data is scanned looking for repeated information (some implementations use a static dictionary so it does have to be built dynamically). Based on the pattern recognition (a look-up in the dictionary), that string of information is replaced by a much shorter but uniquely identifiable string. This results in a compression of that overall data. The size of the dictionary and the speed at which the scan is done is an implementation decision from the different vendors. It's a trade off between cost and latency. There are many techniques for doing this. The most popular compression algorithm is the Limpel-Ziv of which there are several versions. Run-Length-Encoding is a form of this with looking for repeated characters. Huffman encoding used a mathematical probability of character occurrence for representation by smaller bit strings.
This is a whole computer science discipline with many very good textbooks. I suggest buying a couple of those and reading further.
Evaluator Group, Inc.
Editor's note: Do you agree with this expert's response? If you have more to share, post it in our Storage Networking discussion forum.
Dig deeper on Data management tools
Related Q&A from Randy Kerns
Learn why data protection practices common in traditional file storage environments aren't the best option for objects, and how metadata can help.continue reading
Learn how different scale-out architectures can be good options for providing high availability and easy growth.continue reading
Learn about NAS security and if SAN is more secure than NAS in this expert response.continue reading
Have a question for an expert?
Please add a title for your question
Get answers from a TechTarget expert on whatever's puzzling you.