Petabyte storage vendors
Barely a decade ago, data storage vendors would boast of selling an aggregate of a petabyte or two in all of their storage systems sold. Due to the continued rapid increase in storage capacity requirements, it's now common to see individual companies and even single storage systems with more than a PB of storage capacity.
In 2015, Fujitsu released its Eternus DX S3 block storage devices, which can scale from 4.6 PB to 13.8 PB of raw capacity. The HGST Active Archive System, released in 2015, scales to 4.7 PB of raw data. DataDirect Networks released EXAScaler storage arrays with up to 14 PB of capacity across two racks. And the latest EMC Isilon network attached storage (NAS) arrays can scale up to 50 PB.
Petabyte storage and backups
Petabytes are not suited to traditional backups, which have to scan the entire system every time a backup or archiving job occurs. Traditional NAS is scalable and capable of handling petabytes of data, but it can take too much time and use too many resources when going through the system's organized storage index. However, there are a number of other data storage technologies that can back up and archive at a petabyte scale:
- Snapshots and other disk-based backup technologies provide a local copy of the data, enabling a rapid restore.
- Tape and the cloud provide relatively low-cost backup options for petabytes of data, but are more often used as off-site archival storage rather than primary storage.
- Solid-state storage can scan petabytes of data at a much higher speed without sacrificing data integrity.
- Object storage assigns each object a unique identifier, allowing the system to search large amounts of data in a flat space as opposed to examining a complete storage index to find a specific file.
Petabytes and big data
There is no specific quantity of data that qualifies as big data, but the term often refers to information in the petabyte, or even exabyte, range. Mining for information across petabytes of data is a time-consuming task. Organizations working with big data often use the Hadoop Distributed File System because it facilitates rapid data transfer and allows a system to operate uninterrupted while working with petabytes of data.
Also see petaflop.