Most people are familiar with the Russian Matryoshka nesting dolls. As you open each doll, it reveals another smaller one inside. Storage utilization has a parallel to these dolls: Each component in the chain of storage has its own method of capturing metrics and making improvements.
If you focus only on host utilization, you could miss wasted space on the array. Furthermore, you can compound waste by inflating projections for growth or buffer space. Put all of these wasteful practices together, and you can wind up with less than 20% of actual utilization, and think you're doing fine.
Measuring utilization should be simple: Divide the amount of storage used by the amount available for use, and you have your utilization percentage. But obtaining these metrics can prove tricky, and the frame of reference is key. Additionally, many systems have added overhead between the raw storage they can see and the usable storage they allow to be used.
For the purposes of measurement of storage utilization, we'll focus on the following three metrics:
- Raw storage is the amount that's visible to a system before RAID protection, formatting and other overhead.
- Usable storage is the amount that could hold user data.
- Used storage is the amount actually taken up by this data.
Most storage managers focus on host filesystems because they're easy to measure and highly visible. On Unix systems, the ubiquitous (but non-standard) df command shows usable and used storage space in file systems. For Windows systems, the same information is available in the drive properties window. In both cases, filesystem storage is the first line of visibility to both system managers and users.
Uncovering raw storage on hosts is often a little more difficult. Each Unix flavor has its own command for querying visible storage, be it Solaris' format, HP-UX's diskinfo or AIX's lsdev. These commands are intended to show visible storage units, but the LUNs aren't intended for utilization metric collection. In storage area networks (SANs), a single LUN can be presented to a host multiple times to enable multipathing software to function. Many SANs also allow some hosts to see each other's LUNs. Simply adding up the total size of the LUNs shown by diskinfo can lead to an overstatement of raw storage. You must use the device serial numbers shown by commands like EMC's syminq and AIX's lscfg to identify individual disks.
Like the Matryoshka dolls, there's more than one layer of storage utilization. Storage arrays can also abstract raw disk space into usable storage and present it to hosts, as illustrated in "Utilization can be a matter of perspective." To a storage array, space is used when it's dedicated for use by a host, whether or not it is part of a file system or even visible to the intended host. When they're installed, many SANs are configured to present all available storage to hosts according to their projected usage, and much of this goes unused. It's not uncommon to discover that 25% of assigned LUNs aren't in use.
Storage systems dedicated to single hosts are usually the least utilized. The most extreme cases are Windows application servers using small RAID systems. Often, these can't be specified small enough, since today's minimum hard disk size is 20GB. Many Windows applications need only a little storage, and there are often a multitude of these systems in house. I've seen many shops with 200 or more Windows hosts, each with 5% or 10% storage utilization. Attaching these hosts to shared storage wouldn't be practical because Fibre Channel (FC) host bus adapters (HBAs) often cost more than cheap internal RAID, and booting from a SAN isn't yet a widely accepted practice. The only route for improvement here is server consolidation.
Another potential utilization analysis focuses on host-resident logical volume managers. These packages function to abstract storage before it is used, and they also take raw storage and create usable logical volumes. There's often much unused storage within volume groups, and measurement is difficult, because it requires specialized commands such as Veritas' vxprint and HP's vgdisplay.
Finally, databases and applications often manage storage, and have utilization issues within files. Again, specialized commands are required to measure unused table spaces and empty files. A file system could be 95% utilized for storage of database files, but most of these could be empty.
A true measurement of utilization would reflect every layer of usage metrics--from raw disk in a shared array to used storage within files. Raw storage for each new frame of reference is contained within the used storage measured above it, so low utilization is compounded as we move deeper into the stack. Over the past few years, I've conducted numerous utilization analyses of enterprise systems. The average utilization I've observed is less than 25%, just from the usable storage on arrays to the used space within file systems. Counting database table space usage brings this percentage well below 15%.
When a drive is full, the immediate problem for systems administrators is determining what a drive contains, and whether or not some of it can be removed. Many of today's storage resource management (SRM) applications focus on this question. They examine the contents of a file system and attempt to assist in the identification of unnecessary space usage. Once outdated, duplicate or unwanted files are discovered, they can be moved offline or deleted. In this way plain utilization will decrease, but more space is created for new data, potentially increasing the average value of the data on that storage.
While file system analysis isn't a means of increasing utilization, most SRM packages go beyond this. All can display basic file system utilization reports in an accessible and friendly manner, but some of the latest SRM packages also include means of looking past the file system horizon. They include application agents that show database file utilization and storage system agents that include array utilization. While most are passive reporting tools, their reports can be extremely valuable to storage managers intent on improving utilization.
Without integrated SRM packages, data has to be collected from a multitude of places, integrated and analyzed by hand. Collecting array metrics requires vendor-supplied software, but data from these tools can be difficult to extract for processing. Some array data can also be collected by manually counting drive mechanisms or by analyzing asset databases. As mentioned, host information can be collected with commands such as df and format. Database administrators can provide application utilization information.
But bringing all of this data together can be complicated. Arrays, volume managers, hosts and applications often use different semantics to describe the same concepts. An analysis requires understanding what's meant by terms such as a LUN, plex and table space. If multiple platforms are in use, physical partitions need to be reconciled with physical extents and subdisks.
A tremendous value provided by third-party applications is their ability to hide this complexity and output just the valuable data.
|Utilization can be a matter of perspective|
From the host side, all of the storage on the array that is allocated for it is the pool of raw storage available. In the context of the array, that would be considered just one pool of used storage. You may vastly overestimate utilization if you don't go beyond host statistics.
Room for improvement
Fundamentally, utilization is a question of balancing the usage of a storage resource with the need to maintain a buffer for growth. If all growth was planned, just-in-time provisioning would allow storage resources to be added only when needed, and near 100% utilization would be possible. But budgeting and purchasing cycles, installation time and continual growth conspire to require a buffer of unused storage. For this reason, most managers overspecify storage requirements by 20% to 50%.
In many cases, buffers are specified all the way up the chain, from the database administrator through the storage manager. Let's say an application requires 10GB of storage. If everyone includes a 50% buffer for growth, the DBA will ask for 15GB, the systems administrator for 22.5GB and the storage administrator will buy 34GB. This will yield a true utilization rate of just 30%.
While no rule of thumb will fit all situations, a 20% buffer should suffice for many systems. This should provide adequate time to provision more storage, but if your purchasing timeframe is longer than a few months, a much larger buffer may be required. In our example, a 20% buffer gives us just over 17GB of storage purchased, yielding 58% utilization.
One of the less obvious benefits of a shared utility model for storage (December 2002 Integration, "End to end management in sight") is a reduction of the amount of storage required to provide a growth buffer for unexpected requests. Most systems won't experience sudden growth, and the average rate for a number of systems will be more even and predictable. If a large storage environment is shared by a number of applications, a smaller buffer can be maintained, improving overall utilization dramatically.
A smooth and regular process of adding new space to a shared SAN can also reduce the planning timeframe. Most dedicated storage is specified for the life of an application, which can lead to tremendous waste in both disk space and dollars. "Buying storage in advance is wasteful," demonstrates the cost differences between purchasing decisions. The center two columns compare buying four years of storage at today's prices and waiting to buy each year's requirement at the current prices. Although both cases end up with 310GB of storage after four years, annual purchasing saves more than 30% of the cost.
This effect is magnified if the growth projections are incorrect. Many application managers aren't sure how much space they'll need in three years' time, so they overspecify to be on the safe side. In our example, slowing the growth to just 20GB per year saves an additional 2% over annual purchasing. But annual forecasting could also allow the storage manager to reduce the buffer to 20% after the first year, drastically reducing the amount of storage purchased and increasing the cost savings to 40%.
You may also be able to improve file system utilization by focusing on buffer requirements for individual hosts and file systems. On database servers, file systems often don't need to be expanded because new file systems will be created for new data files. Therefore, database filesystems can be filled to almost 100% without concern. Similarly, many hosts won't experience much data growth, so their buffers can be reduced.
The most effective way to improve storage utilization for servers with external storage is to simply to connect them to a shared SAN. A true SAN (as opposed to a system of dedicated storage using FC) allows unused storage to be reallocated to more needy systems dynamically. Growth is averaged across many systems, enabling frequent forecasting and provisioning. A true shared SAN managed as an internal utility can result in tremendous utilization improvements and cost savings.