The capacity utilization for storage is one area where storage vendors have made a lot of improvements. Advanced features such as storage pooling, thin provisioning, and storage virtualization have introduced greater efficiencies for using storage capacity.
Still, trying to understand capacity utilization can be confusing. The utilization must be examined at a larger scale than a single storage system. Storage virtualization can span systems. Thin provisioning overcommits capacity across systems with the ability to drive up utilization rates. The larger the pool, the more flexibility is allowed for a system in allocating storage resources.
Data reduction (compression and/or deduplication) usually allows more data to be stored in a given amount of storage. Data reduction effectiveness varies based on the data type and the implementation by the vendor. Data reduction represents a potential increase in usable capacity. Guidelines or guarantees from the vendor can be used to gauge that potential, and actual measurements are usually available from the management interfaces on the storage systems when data reduction is in use.
In the discussion about storage capacity utilization, it is useful understand basic definitions and update them to current terminology for the technology in use. The following are some of the more basic terms and explanations.
Used capacity – where the data is stored that can be accessed from hosts.
Usable capacity –storage space within a storage system or across pooled systems that can be configured for volumes (LUNs) or filesystems. This is the capacity minus the storage system overhead. The overhead includes data protection such as RAID devices and allocated chunks in storage pools and segments for forward error correction using correcting codes such as erasure codes. Filesystems also reserve space for operational processes, which is not included in the usable capacity calculation.
Allocated but unused capacity – allocated storage space in a volume or filesystem with no data stored. This space is not available for applications or file systems, although it can be used later for data.
Effective capacity – the usable capacity multiplied by the expected effectiveness of data reduction.
Raw capacity – the aggregate of the capacity of the storage devices (hard disk drive, solid-state devices, flash modules).
Storage system data protection also has special considerations.
Snapshots – there are two primary types of implementations: Redirect-On-Write and Copy-On-Write. Redirec- On-Write is used with more recent storage pooling implementations such as all solid-state storage systems, where available space from the storage pool is used for the change data. With thin provisioning, the recommendation is to not exceed 90% utilization including snapshots and used capacity. Copy-On-Write implementations usually depend on pre-allocated capacity to contain a copy of the original data when a change is made. The pre-allocated space is included in the storage system overhead and reduces the usable capacity.
Replicated copies for disaster recovery / business continuance – these are volumes or filesystems, typically at remote sites, that represent a copy of the original active data. For capacity utilization calculation, the space is treated the same as any of the primary volumes – replication just means you need that much more capacity. The effect of low capacity utilization is multiplied with replication.
(Randy Kerns is Senior Strategist at Evaluator Group, an IT analyst firm).