So now, companies have to realistically think about how they put platforms in place that can store petabytes of data over time. The older architectures, the more monolithic approaches, don't provide a good way to do that cost-effectively. So we're seeing the introduction of storage grids and scale-out approaches targeted at unstructured data -- things like Web 2.0 and then also secondary applications like backup and archive.
These dense storage platforms use a very different approach from the monolithic architectures of the past. You can pay as you grow with them and you can add performance, I/O or capacity to these things independently, which gives you a lot of flexibility in building the configuration that best meets your performance requirements.
The key issues to look for are petabyte-class scalability and the ability to maintain high levels of I/O performance at that type of storage capacity. You're also going to look at data reliability. You'll also want to look for data deduplication technologies that are integrated into the platform so that you can use the deduplication multiplier to lower the overall cost of storage.
For example, if you're paying $8 a gigabyte for SATA-based storage,
There's one final area you would need to evaluate these platforms on: whether or not they've got an integrated replication capability, which is really the way you solve backing up that data storage cell, providing a disaster recovery solution, etc. If we're talking about a data store that's 400 TB in size, there's no way you're going to be able to back that up with tape. There has to be another approach, and what's becoming the accepted way to address that these days is by replicating that to an off-site location, such as a mirror platform.
This was first published in July 2008