This article can also be found in the Premium Editorial Download "Storage magazine: Top 15 Storage hardware and software Products of the Year 2006."
Download it now to read this article plus other related content.
The WORST data
Although vendors have been clever at finding a place, and even a replacement, for WORM media in the market, it remained a technology in search of a purpose for many years. Truth be told, most organizations have a desperate need for another type of storage, one that can inexpensively store unchanging data forever.
For example, consider the digitization of paperwork. A company scans images of filled-out forms, writing them as files to a file server or CAS device. These images never change and may never be accessed, what we in the industry jokingly call write-only data. But the data will remain online and accessible for years, and someone will occasionally view the file.
As an analogy to WORM, we can call this data type WORST. Key aspects include online availability (no one wants to wait on the phone while a tape is loaded) and lengthy endurance without modification. Despite its "onlineness," performance requirements are likely low even though the volume of data is large.
The chances are good that you have a lot of WORST data in your data center. Typical applications include scanned images and other media files, engineering reference documents such as schematics and parts lists, and captured scientific and technical data. All of these consume large amounts of disk space, but aren't edited like office documents or source code. And all are likely to have a very long shelf life.
So what storage products best suit WORST data? The
If you use NAS, make sure you create an extensible file-system structure because you're likely to grow it to millions of files and hundreds of terabytes of data. These applications can often be extremely structured, much more so than regular user files. Rather than relying on people to comply with the directory structure, you can simply program the data acquisition application to conform to a structure. Pick something with a few high-level directories that could later be split across multiple devices if needed, and make sure that the files will be balanced evenly across all of them.
CAS is an interesting alternative. It stores files based on their intrinsic content, eliminating the question of directory structure, and most CAS devices are highly extensible and low cost. But both your writing and reading apps must support the CAS device's API. This isn't a problem for a brand-new application, but it could be a significant challenge if you're bringing in new storage to support an existing data set.
Of course, WORST is just one type of data to consider. There are many other data types out there that could be served with some fresh thinking. What's the best way to store remote data, collaborative office files or temporary data? Perhaps if we think critically about the unique requirements of these data types rather than simply what storage we have today, we can develop a truly effective infrastructure for storing them.
This was first published in February 2007