Feature

Measuring data storage utlization

Ezine

This article can also be found in the Premium Editorial Download "Storage magazine: Inside the new Symmetrix DMX model offerings."

Download it now to read this article plus other related content.

Storage layers
Most storage managers focus on host filesystems because they're easy to measure and highly visible. On Unix systems, the ubiquitous (but non-standard) df command shows usable and used storage space in file systems. For Windows systems, the same information is available in the drive properties window. In both cases, filesystem storage is the first line of visibility to both system managers and users.

Uncovering raw storage on hosts is often a little more difficult. Each Unix flavor has its own command for querying visible storage, be it Solaris' format, HP-UX's diskinfo or AIX's lsdev. These commands are intended to show visible storage units, but the LUNs aren't intended for utilization metric collection. In storage area networks (SANs), a single LUN can be presented to a host multiple times to enable multipathing software to function. Many SANs also allow some hosts to see each other's LUNs. Simply adding up the total size of the LUNs shown by diskinfo can lead to an overstatement of raw storage. You must use the device serial numbers shown by commands like EMC's syminq and AIX's lscfg to identify individual disks.

Like the Matryoshka dolls, there's more than one layer of storage utilization. Storage arrays can also abstract raw disk space into usable storage and present it to hosts, as illustrated in "

    Requires Free Membership to View

Utilization can be a matter of perspective." To a storage array, space is used when it's dedicated for use by a host, whether or not it is part of a file system or even visible to the intended host. When they're installed, many SANs are configured to present all available storage to hosts according to their projected usage, and much of this goes unused. It's not uncommon to discover that 25% of assigned LUNs aren't in use.

Storage systems dedicated to single hosts are usually the least utilized. The most extreme cases are Windows application servers using small RAID systems. Often, these can't be specified small enough, since today's minimum hard disk size is 20GB. Many Windows applications need only a little storage, and there are often a multitude of these systems in house. I've seen many shops with 200 or more Windows hosts, each with 5% or 10% storage utilization. Attaching these hosts to shared storage wouldn't be practical because Fibre Channel (FC) host bus adapters (HBAs) often cost more than cheap internal RAID, and booting from a SAN isn't yet a widely accepted practice. The only route for improvement here is server consolidation.

Another potential utilization analysis focuses on host-resident logical volume managers. These packages function to abstract storage before it is used, and they also take raw storage and create usable logical volumes. There's often much unused storage within volume groups, and measurement is difficult, because it requires specialized commands such as Veritas' vxprint and HP's vgdisplay.

Finally, databases and applications often manage storage, and have utilization issues within files. Again, specialized commands are required to measure unused table spaces and empty files. A file system could be 95% utilized for storage of database files, but most of these could be empty.

A true measurement of utilization would reflect every layer of usage metrics--from raw disk in a shared array to used storage within files. Raw storage for each new frame of reference is contained within the used storage measured above it, so low utilization is compounded as we move deeper into the stack. Over the past few years, I've conducted numerous utilization analyses of enterprise systems. The average utilization I've observed is less than 25%, just from the usable storage on arrays to the used space within file systems. Counting database table space usage brings this percentage well below 15%.

Utilizing software
When a drive is full, the immediate problem for systems administrators is determining what a drive contains, and whether or not some of it can be removed. Many of today's storage resource management (SRM) applications focus on this question. They examine the contents of a file system and attempt to assist in the identification of unnecessary space usage. Once outdated, duplicate or unwanted files are discovered, they can be moved offline or deleted. In this way plain utilization will decrease, but more space is created for new data, potentially increasing the average value of the data on that storage.

While file system analysis isn't a means of increasing utilization, most SRM packages go beyond this. All can display basic file system utilization reports in an accessible and friendly manner, but some of the latest SRM packages also include means of looking past the file system horizon. They include application agents that show database file utilization and storage system agents that include array utilization. While most are passive reporting tools, their reports can be extremely valuable to storage managers intent on improving utilization.

Without integrated SRM packages, data has to be collected from a multitude of places, integrated and analyzed by hand. Collecting array metrics requires vendor-supplied software, but data from these tools can be difficult to extract for processing. Some array data can also be collected by manually counting drive mechanisms or by analyzing asset databases. As mentioned, host information can be collected with commands such as df and format. Database administrators can provide application utilization information.

But bringing all of this data together can be complicated. Arrays, volume managers, hosts and applications often use different semantics to describe the same concepts. An analysis requires understanding what's meant by terms such as a LUN, plex and table space. If multiple platforms are in use, physical partitions need to be reconciled with physical extents and subdisks.

A tremendous value provided by third-party applications is their ability to hide this complexity and output just the valuable data.

This was first published in February 2003

There are Comments. Add yours.

 
TIP: Want to include a code block in your comment? Use <pre> or <code> tags around the desired text. Ex: <code>insert code</code>

REGISTER or login:

Forgot Password?
By submitting you agree to receive email from TechTarget and its partners. If you reside outside of the United States, you consent to having your personal data transferred to and processed in the United States. Privacy
Sort by: OldestNewest

Forgot Password?

No problem! Submit your e-mail address below. We'll send you an email containing your password.

Your password has been sent to: