This is a subject that computer science students research and explain to get their PHDs. For an e-mail question, the best I can do is give you a very light overview.
Dictionary-based compression algorithms usually create a dictionary (a pattern of characters) in memory as data is scanned looking for repeated information (some implementations use a static dictionary so it does have to be built dynamically). Based on the pattern recognition (a look-up in the dictionary), that string of information is replaced by a much shorter but uniquely identifiable string. This results in a compression of that overall data. The size of the dictionary and the speed at which the scan is done is an implementation decision from the different vendors. It's a trade off between cost and latency. There are many techniques for doing this. The most popular compression algorithm is the Limpel-Ziv of which there are several versions. Run-Length-Encoding is a form of this with looking for repeated characters. Huffman encoding used a mathematical probability of character occurrence for representation by smaller bit strings.
This is a whole computer science discipline with many very good textbooks. I suggest buying a couple of those and reading further.
Evaluator Group, Inc.
Editor's note: Do you agree with this expert's response? If you have more to share, post it in our Storage Networking discussion forum.
Dig Deeper on Storage management tools
Related Q&A from Randy Kerns
Compare SAN and NAS, and find out what to consider when using each storage system format. Object storage and the cloud are also affecting the storage... Continue Reading
Logical unit numbers are a logical abstraction between a physical disk device and applications. Learn more about LUN use cases and LUN security ... Continue Reading
What is the one hidden gotcha that you'd advise users about if they were shopping for an all-flash storage array? Continue Reading