Problem solve Get help with specific problems with your technologies, process and projects.

Why does my EMC 45 TB drive yield only 41 TB of usable storage?

Storage expert Ashley D'Costa answers a reader's question: "When is a 300 GB drive not a 300 GB drive?"

When is a 300 GB drive not a 300 GB drive? I had known about the importance of dealing with "usable" vs. "raw," but what I did not know was that every vendor uses base-10 to generate their quotes while the field engineers use base-2. I was particularly surprised to find that the Hitachi GUI tool "Resource Manager" also works on base-2, thus misleading the storage guys to think they have more gigabytes than they do! A recent 45 TB purchase from EMC actually yielded 41.3 TB of usable storage. Comments?
Yeah, I can just imagine the frustration when you're trying to cram those last few downloaded bytes from Nickelback's latest, only to discover that you've been short-changed those last few vestiges of your post-grunge melody. And not by a few bits of rounding error, but by an order of magnitude!

To be a bit more precise, it would be about 93% less than what you were expecting if we're talking gigabytes. Why...

93%? That's the ratio between a gigabyte expressed in base-10 and a gigabyte expressed in Base-2. But then that's your basic question: Which is it, base-10 or base-2?

What it boils down to is how a gigabyte (or megabyte or yottabyte or whatever-byte) is defined, not just by storage manufacturers and industry experts (ahem), but also by computer scientists (ahem, again), who ultimately started all of this. It's a complex and sordid story. Now that you're riveted, here goes.

Although the definition of how many bits comprises a byte has changed through the early history of computer science, today it is generally accepted that a byte is expressible as 8 bits (which began with the influence of IBM's System/360 architecture back in the day when everyone else was distracted by Flower power).

However, as computing evolved and memory capacities grew to thousands and millions of bytes, the need arose to express these larger byte capacities more conveniently. Like all other fields of science, the system chosen was the metric (or SI) prefix (kilo, mega, giga, etc). This logical and seemingly harmless decision was the start of all the problems, for memory capacity was (and is still to this day) built in multiples of two, and the metric system represented numbers in multiples of ten.

How could my fellow computer science predecessors not know this? In truth, they did. However, the difference between a kilobyte base-2 and kilobyte base-10 is only 24 bytes. No one at the time actually believed that in a few short years scientists would be using the word gigabyte. But Moore's Law had its way, and here we are with the discrepancy between base-2 and base-10 getting bigger and bigger.

So, history lesson aside, what is the definition of a gigabyte? As defined by the IEEE, it is 1 billion bytes (1,000,000,000) as per the correct usage for the metric prefix (SI) notation. But in practice, it depends. If you're talking computer memory or, more crucially, file space, for reasons that will soon become apparent it's considered to be 1 billion base-2 bytes (1,073,741,824 bytes).

If you're talking storage (hard disks, flash drives, etc), it's considered to be 1 billion base-10 bytes (1,000,000,000). So technically, storage manufacturers are using the term "gigabyte" in the correct way as defined by standards bodies, and a 300 GB hard drive is, in fact, 300 billion bytes exactly.

So why does it not appear like you have 300 billion bytes available to you when you try to save your latest tunes or grow that SAP database? The reason is that your computer's operating system is actually calculating the file consumption and available space in terms of base-2 binary, not base-10 decimal. So a kilobyte, megabyte or gigabyte as reported by most, if not all, operating systems is as a power of 1024 not a power of 1000. Operating systems do this because computer CPU and memory architectures are constructed according to base-2 math. Thus computers store information in base-2 sized segments, not base-10 sized segments.

In other words, computers save files onto their storage in 1024-byte chunks because it processes the information in 1024-byte chunks. Thus operating systems report information this way as well. If they reported kilobytes in 1000-byte chunks rather than the 1024-byte chunks that it saves, you'd end up with fractional answers that would potentially be subject to rounding errors and even more confusion as the operating system attempts to report back values that can be read back conveniently.

In the end, it looks like the storage manufacturer here is not at fault and you can evaporate any thoughts of financial compensation for being ripped off. Your 300 GB hard drive does, in fact, hold 300 billion bytes and when it gets full, you are truly consuming 300 billion bytes. It's just in bigger chunks than you originally thought. Sadly, it's your own computer that's taken that huge "byte" out of your 300 gigabytes. (Sorry, I couldn't resist).

Dig Deeper on Primary storage devices