Problem solve Get help with specific problems with your technologies, process and projects.

The data density dilemma

Find out about the dilemma of rapidly increasing disk densities combined with slower increases in disk speed.

Dr. Geoff Barrall
CTO, BlueArc Corporation
Dr. Barrall is the CTO, executive vice president and co-founder of BlueArc Corporation and the principal architect of its core technology, the SiliconServer Architecture. Prior to joining BlueArc, Dr. Barrall founded four other ventures, including one of the first Fast Ethernet companies and a successful UK consultancy business. In this role, he was involved in the introduction of innovative networking products into UK markets including the Packeteer and NetScout. Dr. Barrall received his PhD in Cybernetics from the University of Reading in 1993.

The increase in disk densities in the computing industry is occurring at a much faster rate than the increase in disk speeds. This presents a rapidly-accelerating problem that may yet be under the radar for most in the storage industry.

Customers want more data storage more cheaply, driving a manufacturing trend to increase disk sizes as quickly as possible, by doubling the density per disk while costs remain steady. As larger disks become available, older, smaller disks are retired.

At first glance this would seem to be a very good thing for CIOs everywhere. Yet, there is a hidden downside to this trend, as this increase in density is occurring at a much faster rate than the increase in disk speeds. Should this continue, it will inevitably lead to a decrease in storage system performance or an underutilization of available data space.

Research shows a dramatic decrease in seek time (a good measure of disk performance in a reasonably random data environment) and an accompanying increase in disk capacity over time over the last decade. As can be seen, the reduction in seek time is a mostly linear phenomenon while the increase in capacity is more of an exponential curve. This combination gives rise to a situation where the actual realized performance of the disk drive is decreasing over time (as size increases at a faster rate than performance).

You would find agreement from today's vendors that it takes a certain number of disk drives to be attached to a server system in order to achieve an acceptable performance level (typically a minimum of approximately seven disks today) per RAID volume (basically a single file system or data area).

With today's 73G Byte disk drives, this requirement forces the customer to purchase a minimum of approximately 500G Bytes of data. When disk density increases force vendors to sell 150G Byte drives, a seven-disk system will come with more than a full terabyte of storage, assuming even more disks aren't required to keep the same performance level!

Storage system administrators could opt to partition these disks into multiple partitions for different applications, but performance would suffer as two separate servers or server processes would be instructing the disk heads to move rapidly between partitions, massively increasing data seek and fetch times. In a system tuned for high performance, this is not really a practical option.

Fuzzy SCSI

Another rapidly approaching storage limit is imposed by the SCSI specification. Today, given a reasonable data transfer size (a standard 512-byte block), it is impossible to create a data partition over two raw terabytes in size and subsequently access the data using the SCSI protocol, including SCSI over Fibre Channel FCP. If disk speeds continue to drop and sizes continue to increase, available data space will soon have to be abandoned in systems where high performance is to be maintained!

Though this effect has been commented on previously, only now with the approach of the hard SCSI two-terabyte limit will the problem require resolution. For the first time, additional disks cannot be the answer without wasted storage capacity. If we follow trend lines accompanying this article, it is reasonable to assume that by 2004 we will have 360G Byte disks that will actually be slower (in access per gigabyte) than today's disks.

Assuming this slowdown only requires us to use seven disks in order to build a partition with reasonable performance, then this will be a 2.5T Byte partition. Assuming this partition is not further broken down into multiple file systems, as we need to maintain maximum performance, the SCSI 2T Byte limit would require 500G Bytes on the volume remain unused. This effect, if not resolved, will thwart the decrease in storage costs that a CIO benefits from today.

Fortunately, a solution is coming. The t10 committee, whose job it is to propose and control SCSI standardization, has a working draft called the SCSI Block Commands 2 (or SBC-2) that will move the 2T Byte limit on partition sizes to 9.5 million petabytes per partition, which ought to hold things for a while. Currently, this standard is in an early stage (revision 6 was released on May 6, 2002), so the key question remains as to whether these new commands can make it into storage arrays, RAID controllers and servers before this limit is reached in the enterprise.

The race is on.

Copyright 2002, Blue Arc Corporation.

Dig Deeper on Primary storage devices