Disk striping is the process of dividing a body of data into blocks and spreading the data blocks across multiple storage devices, such as hard disks or solid-state drives (SSDs). A stripe consists of the data divided across the set of hard disks or SSDs, and a striped unit, or strip, that refers to the data slice on an individual drive.
Storage systems vary in the way they perform data striping. For instance, a system may stripe data at the byte, block or partition level, and it can stripe data across all or only some of the disks in a cluster. For instance, a storage system with 10 hard disks might stripe a 64 KB block on the first, second, third, fourth and fifth disks and then start over again at the first disk. Another system might stripe 1 megabyte (MB) on each of its 10 disks before returning to the first disk to repeat the process.
Pros and cons of disk striping
The main advantage of disk striping is higher performance. For example, striping data across three hard disks would provide three times the bandwidth of a single drive. If each drive runs at 200 input/output operations per second (IOPS), disk striping would make available up to 600 IOPS for data reads and writes.
The disadvantage of disk striping is low resiliency. The failure of any physical drive in the striped disk set results in the loss of the data on the striped unit, and consequently, the loss of the entire data set stored across the set of striped hard disks.
Disk striping and RAID
Redundant array of independent disks (RAID) uses disk striping to distribute and store data across multiple physical drives. Disk striping is synonymous with RAID 0 and spreads the data across all the disk drives in a RAID group without parity. Disk striping without parity is not fault tolerant.
Disk striping without RAID may be used for temporary data, scratch space, or in situations where a master copy of the data is easily recoverable from another storage device.
Disk striping with parity
To address the potential for data loss with RAID 0, a RAID set typically uses at least one stripe for parity. The parity information is commonly calculated by using the binary exclusive or (XOR) function and stored on a physical drive in the RAID set. If a storage drive in the striped RAID set fails, the data is recoverable from the remaining drives and the parity stripe.
For a data set with n drives, the data might be striped on drives n through n-minus-1, and the nth drive would be reserved for parity. For example, in a RAID set with 10 drives, data could be striped to nine drives, and the 10th drive would be used for parity.
Disk striping with RAID provides redundancy and reliability. RAID 4 and RAID 5 protect against a single drive failure. RAID 6 uses two drives for parity and protects against two drive failures. Data protection can be extended beyond two storage device failures using erasure coding.
One disadvantage of disk striping with parity is the performance penalty for small random writes, as the system accesses all the stripe units in the striped RAID set.
Disk striping and disk mirroring
Disk striping can be combined with disk mirroring, or RAID 1, to speed performance and expand capacity by striping data across multiple sets of mirrored drives. The disadvantage of disk striping with mirroring is the 50% overhead inherent in using half the capacity to make an exact copy of the data for protection.