Data storage is the collective methods and technologies that capture and retain digital information on electromagnetic, optical or silicon-based storage media. Storage is a key component of digital devices, as consumers and businesses have come to rely on it to preserve information ranging from personal photos to business-critical information.
Storage is frequently used to describe the devices and data connected to the computer through input/output (I/O) operations, including hard disks, flash devices, tape systems and other media types.
Why data storage is important
Underscoring the importance of storage is a steady climb in the generation of new data, which is attributable to big data and the profusion of internet of things (IoT) devices. Modern storage systems require enhanced capabilities to allow enterprises to apply machine learning-enabled artificial intelligence (AI) to capture this data, analyze it and wring maximum value from it.
Larger application scripts and real-time database analytics have contributed to the advent of highly dense and scalable storage systems, including high-performance computing storage, converged infrastructure, composable storage systems, hyper-converged storage infrastructure, scale-out and scale-up network-attached storage (NAS) and object storage platforms.
By 2025, it is expected that 163 zettabytes (ZB) of new data will be generated, according to a report by IT analyst firm IDC. That estimate represents a potential tenfold increase from the 16 ZB produced through 2016.
How data storage works
The term storage may refer both to a user's data generally and, more specifically, to the integrated hardware and software systems used to capture, manage and prioritize the data. This includes information in applications, databases, data warehouses, archiving, backup appliances and cloud storage.
Digital information is written to target storage media through the use of software commands. The smallest unit of measure in a computer memory is a bit, described with a binary value of 0 or 1, according to the level of electrical voltage contained in a single capacitor. Eight bits make up one byte.
Other capacity measurements to know are:
- kilobit (Kb)
- megabit (Mb)
- gigabit (Gb)
- terabit (Tb)
- petabit (Pb)
- exabit (Eb)
Larger measures include:
- kilobyte (KB) equal to 1,024 bytes
- megabyte (MB) equal to 1,024 KB
- gigabyte (GB) equal to 1,024 MB
- terabyte (TB) equal to 1,024 GB
- petabyte (PB) equal to 1,024 TB
- exabyte (EB) equal to 1,024 PB
Few organizations require a single storage system or connected system that can reach an exabyte of data, but there are storage systems that scale to multiple petabytes.
Data storage capacity requirements define how much storage is needed to run an application, a set of applications or data sets. Capacity requirements take into account the types of data. For instance, simple documents may only require kilobytes of capacity, while graphic-intensive files, such as digital photographs, may take up megabytes, and a video file can require gigabytes of storage. Computer applications commonly list the minimum and recommended capacity requirements needed to run them.
On an electromechanical disk, bytes store blocks of data within sectors. A hard disk is a circular platter coated with a thin layer of magnetic material. The disk is inserted on a spindle and spins at speeds of up to 15,000 revolutions per minute (rpm). As it rotates, data is written on the disk surface using magnetic recording heads. A high-speed actuator arm positions the recording head to the first available space on the disk, allowing data to be written in a circular fashion.
A sector on a standard disk is 512 bytes. Recent advances in disk include shingled magnetic recording, in which data writes occur in overlapping fashion to boost the platter's areal density.
On solid-state drives (SSDs), data is written to pooled NAND flash, designed with floating gate transistors that enable the cell to retain an electrical charge. An SSD is not technically a drive, but it exhibits design characteristics similar to an integrated circuit, featuring potentially millions of nanotransistors placed on millimeter-sized silicon chips.
Backup data copies are written to disk appliances with the aid of a hierarchical storage management system. And although less commonly practiced than in years past, the tactic of some organizations remains to write disk-based backup data to magnetic tape as a tertiary storage tier. This is a best practice in organizations subject to legal regulations.
A virtual tape library (VTL) uses no tape at all. It is a system in which data is written sequentially to disks, but retains the characteristics and properties of tape. The value of a VTL is its quick recovery and scalability.
Evaluating the storage hierarchy
Organizations increasingly use tiered storage to automate data placement on different storage media, based on an application's capacity, compliance and performance requirements.
Enterprise data storage is often classified as primary and secondary storage, depending on how the data is used and the type of media it requires. Primary storage handles application workloads central to a company's day-to-day production and main lines of business.
Primary storage is occasionally referred to as main storage or primary memory. Data is held in random access memory (RAM) and other built-in devices, such as the processor's L1 cache. Secondary storage encompasses data on flash, hard disk, tape and other devices requiring I/O operations. Secondary storage media is often used in backup and cloud storage.
Primary storage generally provides faster access than secondary storage due to the proximity of storage to the computer processor. On the other hand, secondary storage can hold much more data than primary storage. Secondary storage also replicates inactive data to a backup storage device, yet keeps it highly available in case it is needed again.
Digital transformation of business has prompted more and more companies to deploy multiple hybrid clouds, adding a remote tier to buttress local storage.
Types of data storage devices/mediums
Data storage media have varying levels of capacity and speed. These include cache memory, dynamic RAM (DRAM) or main memory; magnetic tape and magnetic disk; optical disc, such as CDs, DVDs and Blu-ray disks; flash memory and various iterations of in-memory storage; and cache memory.
Along with main memory, computers contain nonvolatile read-only memory (ROM), meaning data cannot be written to it.
The main types of storage media in use today include hard disk drives (HDDs), solid-state storage, optical storage and tape. Spinning HDDs use platters stacked on top of each other coated in magnetic media with disk heads that read and write data to the media. HDDs are widely used storage in personal computers, servers and enterprise storage systems, but SSDs are starting to reach performance and price parity with disk.
SSDs store data on nonvolatile flash memory chips. Unlike spinning disk drives, SSDs have no moving parts. They are increasingly found in all types of computers, although they remain more expensive than HDDs. Although they haven't gone mainstream yet, some manufacturers are shipping storage devices that combine a hybrid of RAM and flash.
Optical data storage is popular in consumer products, such as computer games and movies, and is also used in high-capacity data archiving systems.
Flash memory cards are integrated in digital cameras and mobile devices, such as smartphones, tablets, audio recorders and media players. Flash memory is found on Secure Digital cards, CompactFlash cards, MultiMediaCards and USB memory sticks.
Physical magnetic floppy disks are rarely used in the era of flash. Unlike older models, newer computer systems are not equipped with slots to insert floppy disks, which emerged as an alternative to magnetic disk. Use of floppy disks started in the 1970s but was phased out in the late 1990s. Virtual floppy disks are sometimes used in place of the 3.5-inch physical diskette, allowing users to mount an image file mapped to the A: drive on a computer.
Enterprise storage networks and server-side flash
Enterprise storage vendors provide integrated NAS systems to help organizations collect and manage large volumes of data. The hardware includes storage arrays or storage servers equipped with hard drives, flash drives or a hybrid combination, and storage OS software to deliver array-based data services.
The storage management software offers data protection tools for archiving, clones, copy data management, replication and snapshots. Data reduction features, including compression, data deduplication and thin provisioning, are becoming standard features of most storage arrays. The software also provides policy-based management to govern data placement for tiering to secondary data storage or a hybrid cloud to support a disaster recovery plan or long-term retention.
Since 2011, an increasing number of enterprises have implemented all-flash arrays outfitted only with NAND flash-based SSDs, either as an adjunct or replacement to disk arrays.
Unlike disk, flash storage devices do not rely on moving mechanical parts to store data, thus offering faster access to data and lower latency than HDDs. Flash is nonvolatile, allowing data to persist in memory even if the storage system loses power. Disk-based storage systems require onboard battery backup or capacitors to keep data persistent. However, flash has not yet achieved an endurance equivalent to disk, leading to hybrid arrays that integrate both types of media.
There are three basic designs of networked storage systems. In its simplest configuration, direct-attached storage (DAS) involves the internal hard drive in an individual computer. In the enterprise, DAS can be a cluster of drives in a server or a group of external drives that attach directly to the server though the Small Computer System Interface (SCSI), Serial Attached SCSI (SAS), Fibre Channel (FC) or internet SCSI (iSCSI).
NAS is a file-based architecture in which multiple file nodes are shared by users, typically across an Ethernet-based local area network (LAN) connection. The advantage of NAS is that filers do not require a full-featured enterprise storage operating system. NAS devices are managed with a browser-based utility, and each node on the network is assigned a unique IP address.
Closely related to scale-out NAS is object storage, which eliminates the necessity of a file system. Each object is represented by a unique identifier. All the objects are presented in a single flat namespace.
A storage area network (SAN) can be designed to span multiple data center locations that need high-performance block storage. In a SAN environment, block devices appear to the host as locally attached storage. Each server on the network is able to access shared storage as though it were a direct-attached drive.
Advances in NAND flash, coupled with falling prices in recent years, have paved the way for software-defined storage. Using this configuration, an enterprise installs commodity-priced SSDs in an x86-based server, using third-party storage software or custom open source code to apply storage management.
Nonvolatile memory express (NVMe) is a developing industry protocol for flash. Industry observers expect NVMe to emerge as the de facto standard for flash storage. NVMe flash will allow applications to communicate directly with a central processing unit (CPU) via Peripheral Component Interconnect Express (PCIe) links, bypassing SCSI command sets transported to a network host bus adapter. NVMe over Fabrics (NVMe-oF) is intended to speed the transfer of data between a host computer and flash target, using established Ethernet, FC or InfiniBand network connectivity.
A nonvolatile dual inline memory module (NVDIMM) is hybrid NAND and DRAM with integrated backup power that plugs into a standard DIMM slot on a memory bus. NVDIMMs only use flash for backup, processing normal calculations in the DRAM. An NVDIMM puts flash closer to the motherboard, presuming the computer's manufacturer has modified the server and developed basic input-output system (BIOS) drivers to recognize the device. NVDIMMs are a way to extend system memory or add a jolt of high-performance storage, rather than to add capacity. Current NVDIMMs on the market top out at 32 GB, but the form factor has seen density increases from 8 GB to 16 GB in just a few years.
Major data storage vendors
Consolidation in the enterprise storage market has winnowed the field of primary NAS and SAN array vendors in recent years. Storage vendors that penetrated the market with disk products now derive most of their sales from all-flash or hybrid flash. Market-leading vendors include:
- Dell EMC, the storage division of Dell Technologies
- Hewlett Packard Enterprise (HPE)
- HPE Nimble Storage
- Hitachi Vantara
- IBM Storage
- Pure Storage
- Quantum Corp.
- Tegile Systems, part of Western Digital Corp.
Smaller NAS vendors include Drobo, iXsystems, Panasas and Synology. Leading hyper-converged infrastructure (HCI) vendors include Atlantis Computing, Cisco (HyperFlex), HPE SimpliVity, Nutanix, Pivot3, Promise Technology, Scale Computing and VMware VSAN. Most major enterprise storage vendors also offer branded HCI and converged infrastructure products.