Definition

ZFS

Carol Sliwa

ZFS is a local file system and logical volume manager created by Sun Microsystems Inc. to direct and control the placement, storage and retrieval of data in enterprise-class computing systems.

The ZFS file system and volume manager is characterized by data integrity, high scalability and built-in storage features such as:

Replication - the process of making a replica (a copy) of something.
Deduplication - a process that eliminates redundant copies of data and reduces storage overhead.
Compression - a reduction in the number of bits needed to represent data.
Snapshots - a set of reference markers for data at a particular point in time.
Clones - an identical copy of something.
Data protection - the process of safeguarding important information from corruption and/or loss.

History of ZFS

Sun engineers began development work on ZFS in 2001 for the company's Unix-based Solaris operating system (OS). In 2005, Sun released the ZFS source code under a common development and distribution license (CDDL) as part of the Open Source OpenSolaris OS. A community of developers, including representatives from Sun and other vendors, worked on enhancements to the open source code and ported ZFS to additional OSes, including FreeBSD, Linux and Mac OS X.

The OpenSolaris open source project, which included ZFS, ended after Oracle Corp. acquired Sun in 2010 and trademarked the term ZFS. Engineers at Oracle continue to enhance and add features to ZFS on Solaris. Oracle uses its proprietary ZFS code as the foundation for Oracle Solaris, the Oracle ZFS Storage Appliance and other Oracle technologies.

A development community started a new open source project, called OpenZFS, based on the ZFS source code in the final release of OpenSolaris. The open community works on new features, improvements and bug fixes to the OpenZFS code. OSes that support OpenZFS include Apple OS X, FreeBSD, illumos (which is based on OpenSolaris), and Linux variants such as Debian, Gentoo and Ubuntu. OpenZFS works on all Linux distributions, but only some commercial vendors provide it as part of their distributions. Companies with commercial products built on OpenZFS include Cloudscaling, Datto, Delphix, Joyent, Nexenta, SoftNAS and Spectra Logic.

ZFS and OpenZFS tend to appeal to enterprises that need to manage large quantities of data and ensure data integrity. Users include scientific institutions, national laboratories, government agencies, financial firms, telecommunications, and media and entertainment companies.

ZFS initially stood for Zettabyte File System, but the word zettabyte no longer holds any significance in the context of the file system. As a 128-bit file system, ZFS has the potential to scale to 256 quadrillion zettabytes.

How ZFS works

ZFS is designed to run on a single server, potentially with hundreds if not thousands of attached storage drives. ZFS pools the available storage and manages all disks as a single entity. A user can add more storage drives to the pool if the file system needs additional capacity. ZFS is highly scalable and supports a large maximum file size.

ZFS stores at least two copies of metadata each time data is written to disk. The metadata includes information such as the disk sectors where the data is stored, the size of the data blocks and a checksum of the binary digits of a piece of data. When a user requests access to a file, a checksum algorithm performs a calculation to verify that the retrieved data matches the original bits written to disk. If the checksum detects an inconsistency, it flags the bad data. In systems with a mirrored storage pool or the ZFS version of RAID, ZFS can retrieve the correct copy from the other drive and repair the damaged data copy.

ZFS is commonly referred to as a copy-on-write file system, although Oracle describes it as redirect on write. When ZFS writes data to disk, it does not overwrite data in place. ZFS writes a new block to a different spot on the disk and updates the metadata to point to the newly written block, while also retaining older versions of the data.

A true copy-on-write file system would make an exact replica of a data block in a separate location before overwriting the original block. Before overwriting the data, the system would need to read the block's previous value. A copy-on-write file system requires three I/O operations -- read, modify and write -- for each data write. By contrast, a redirect-on-write system requires only one I/O operation, facilitating greater efficiency and higher performance.

ZFS is a popular choice for network-attached storage systems, running NFS on top of the file system, as well as in virtual server environments. Another common Deployment scenario is layering a clustered file system, such as the General Parallel File System (GPFS) or Lustre, on top of ZFS to enable scaling to additional server nodes. OpenStack users can deploy ZFS as the underlying file system for Cinder block storage and Swift object storage.

Key features of ZFS

Snapshots and clones: ZFS and OpenZFS can make Point-in-time copies of the file system with great efficiency and speed because the system retains all copies of the data. Snapshots are immutable copies of the file system, while clones can be modified. Snapshots and clones are integrated in boot environments with ZFS on Solaris, enabling users to roll back to a snapshot if anything goes wrong when they patch or update the system. Another potential ZFS benefit is as a recovery technique against ransomware.

RAID-Z: RAID-Z allows the same data to be stored in multiple locations to enhance fault tolerance and improve performance. The system reconstructs the data on the lost drive using the information stored on the system's other drives. Similar to RAID 5, RAID-Z stripes parity information across each drive to permit a storage system to function even if one drive fails. However, with RAID-Z, the striped data is a full block, which is variable in size. Although RAID-Z is typically compared to RAID 5, it performs some operations differently to address certain long-standing issues with traditional RAID. One issue that RAID-Z addresses is known as the write hole effect, where a system cannot determine which data or parity blocks have been written to disk due to a power failure or catastrophic system interruption. Vendors of systems that use traditional RAID typically resolve the problem through the use of an uninterruptible power supply or dedicated hardware.

RAID-Z2 supports the loss of two storage drives, similar to RAID 6, and RAID-Z3 can tolerate the loss of three storage devices. Users have the option to arrange drives in groups, as with conventional RAID. For instance, a system with two groups of six drives set up as RAID-Z3 could tolerate the loss of three drives in each group.

Compression: Inline data compression is a built-in feature in ZFS and OpenZFS to reduce the number of bits necessary to store data. ZFS and OpenZFS each support a number of compression algorithms. Users have the option to enable or disable inline compression.

Deduplication: Inline data deduplication is a built-in feature in ZFS and OpenZFS that enables storage efficiency by eliminating redundant data. ZFS and OpenZFS find the duplicate data by looking at the checksum for a block, which can vary in size. Users can enable or disable inline deduplication.

ZFS send/receive: ZFS and OpenZFS enable a snapshot of the file system to be sent to a different server node, allowing a user to replicate data to a separate system for purposes such as backup or data migration to cloud storage.

Security: ZFS and OpenZFS support delegated permissions and finer-grained access control lists to manage who can perform administrative tasks. Users have the option to set ZFS as read-only, so no data can be changed. Oracle supports encryption in ZFS on Solaris.

ZFS advantages and limitations

ZFS integrates the file system and volume manager so users do not have to obtain and learn separate tools and sets of commands. ZFS offers a rich feature set and data services at no cost, since it is built into the Oracle OS. Open source OpenZFS is freely available. The file system can be expanded by adding drives to the storage pool. Traditional file systems require the disk partition to be resized to increase capacity, and users often need volume management products to help them.

ZFS is limited to running on a single server in contrast to distributed or parallel file systems, such as GPFS and Lustre, which can scale out to multiple servers.

The rich feature set offered with ZFS can make the software complicated to use and manage. Features such as the integrated ZFS checksum algorithms can require additional processing power and affect performance.

In the Linux community, there are various opinions on licensing with respect to the redistribution of the ZFS code and binary kernel modules. For instance, Red Hat considers it problematic to distribute code protected under a CDDL with code protected under a general public license (GPL). By contrast, Canonical, which distributes Ubuntu, determined that it is in compliance with the terms of the CDDL and GPL licenses.

ZFS vs. OpenZFS

Oracle's ZFS and open source OpenZFS derive from the same ZFS source code. On separate tracks, Oracle and the open source community have added extensions and made significant performance improvements to ZFS and OpenZFS, respectively. The Oracle ZFS updates are proprietary and available only in Oracle technologies. Updates to the open source OpenZFS code are freely available.

The list of enhancements that Oracle has made to ZFS since 2010 includes:

encryption;
support for the persistence of compressed data across OS reboots in the L2 adaptive replacement cache (ARC);
bootable Extensible Firmware Interface labels that provide support for physical disks and virtual disk volumes greater than 2 TB in size;
default user and group quotas; and
pool/file system monitoring.

The list of updates the open source community has made to OpenZFS includes:

additional compression algorithms;
resumable send/receive, which allows a long-running ZFS send/receive operation to restart from the point of a system interruption;
compressed send/receive, which allows the system to send compressed data from one ZFS pool to another without having to decompress/recompress the data when moving from the sending node to the destination;
compressed ARC, which allows ZFS to keep compressed data in memory, enabling a larger working data set in cache; and
maximum block size increase from 128 KB to 16 MB to improve performance when working with large files and speed data reconstruction when recovering from a drive failure.

This was last updated in March 2017