At the Crypto 2004 conference in Santa Barbara, Calif. last week, several papers were presented that demonstrated vulnerabilities in a popular algorithm used to create digital signatures.
The flaw in the MD5 algorithm could
"The results are preliminary, but users subject to SEC 17 a-4 should check this out carefully," said Peter Gerr, analyst with the Milford, Mass.-based Enterprise Strategy Group. Rule 17 a-4 states that data must be stored in a non-rewritable, non-erasable form. Gerr advised users to wait until the National Institute of Standards and Technology (NIST) officially confirms that all use of MD5 for single-instance storage systems are non-compliant, before taking any action. "Until then it tends to remain FUD," he said.
Single-instance storage features, like the one used by the Centera device, purport to store only one copy of any file with a unique content address, regardless of how many times duplicate objects are submitted for storage. Until now this seemed like a great idea from the point of view of storage savings.
However, this assumption that if two files have the same content address then they must have identical contents is incorrect when the single-instancing storage feature is enabled and the content address is computed using the MD5 cryptographic algorithm.
EMC responded that Centera uses two different types of naming schemes -- one based on MD5 and another based on MD5 plus, an EMC-developed algorithm, which also incorporates time and date stamps to the content address. Regular background checks run the algorithm across the object to make sure that it is the same as the original. Centera stores a second copy with the same content address for these verfication purposes. In addition, MD5 plus gives users the option to up the encryption to 128-bit or 256-bit and turn off the single-instance storage feature.
Roy Sanford, a vice president in the Centera division, likened the probability of someone creating two files at the same exact time, on the same exact entry node, with exactly the same content, to someone running the 100 meter dash in two seconds. "At some point they could, but is it something that people should lose sleep over today? No," he said.
Sanford also pointed out that the Centera is responsible for the integrity of the data stored on the system, which he said is different to the security of the information, which falls to network security and application security.
"Vendors of products that rely on MD5 will say that there are more stars in the universe, than there are chances of a unique address being created twice, but that's only true if there isn't a weakness in the algorithm," said Will McGovern, chief architect of Network Appliance Inc.'s SnapLock product, which does not use MD5. "Cracking algorithms is like an arms race," he said, "someone will always get around existing measures." He noted that one way to address this vulnerability is to turn off the single-instancing storage feature.
NetApp's file system works like a regular C drive in that users cannot use the same file name twice to store a file. The company claims its SnapLock software prevents users from erasing files that already exist.
Permabit Inc, another provider of single-instance storage for archival purposes uses the SHA-256 algorithm instead of MD5. "This is the only acceptable and recommended algorithm for use in federal information processing," said a spokesman for the company.
To see how the Centera system works, users can check out cascommunity.org.
An MD5 archive corruption scenario
- A hacker or rogue employee at a company could, using freely available MD5 hash utilities from the Web, generate hashes for files she or he wants to obscure. The employee could send data that they don't want archived by following this procedure.
- Using the mechanisms described at the Crypto 2004 conference (free utilities will soon appear) hackers could reverse engineer a small binary file with the same hash as those generated for the files to be obscured.
- Now the hacker sends themselves an e-mail (routed through the server being archived on Centera using a single-instancing archive) with the reverse-engineered hash as an attachment. That action "seeds" the system a single-instancing archive with "garbage data."
- The hacker finally sends the suspicious message with attachments they don't want the SEC to see via the same e-mail server to his customer.
- The Centera will look at the hash of these most recent attachments and detect that they already "exist" via a single-instance hash match.
- Because the system believes the real attachments already exist, it will simply ignore the new attachments and re-point the new reference to the originally archived attachments, which are actually reverse-engineered garbage that have the same hash -- known as a "hash collision" -- causing silent data loss.
- When the SEC or the auditors/supervisors attempt to retrieve what they think are real attachments from the single-instancing archive system, they will only get the garbage attachments which were maliciously "seeded." There is no way to get to the real attachments in question because they were never really stored.