News Stay informed about the latest enterprise technology news and product updates.

Disk vs. tape, cont’d (ad nauseam)

Disk vs. tape is not a new argument, but over time it takes on different permutations, especially as disk-based backup in its various forms gains popularity and new technologies get introduced like data deduplication that bring some of the economics of disk closer to those of tape.

One theme I’ve heard cropping up in this discussion among high-end vendors lately is the idea of people in large enterprises deploying vast amounts of disk for backup, then realizing the cost inefficiencies, and space and power requirements of disk, and finally running back to tape either alongside or as a replacement for disk.

This back-and-forth popped up again in post written by IBM’s Tony Pearson in response to a post written by Hitachi Data Systems’ Hu Yoshida. Yoshida’s post referred to a conversation with a storage admin at SNW who said his robotic tape libraries were actually drawing more power than his enterprise VTL.

This idea makes Pearson sputter:

I am not disputing [the] approach. It is possible that [the user] is using a poorly written backup program, taking full backups every day, to an older non-IBM tape library, in a manner that causes no end of activity to the poor tape robotics inside. But rather than changing over to a VTL, perhaps Mark might be better off investigating the use of IBM Tivoli Storage Manager, using progressive backup techniques, appropriate policies, parameters and settings, to a more energy-efficient IBM tape library. In well tuned backup workloads, the robotics are not very busy. The robot mounts the tape, and then the backup runs for a long time filling up that tape, all the meanwhile the robot is idle waiting for another request.

The weird thing is, I’ve heard plenty of vendors debating this of their own accord, usually taking sides along product lines with tape-centric vendors taking the position Pearson did, and vendors who sell disk for secondary storage taking the opposite view.

But I’m curious. I’m sure there’s some middle ground where the advantages and disadvantages just depend on personal preferences. But might there really be a trend here? Are users finding problems with disk-based systems and re-integrating tape? How many organizations really even left tape totally behind to begin with? And how do new data reduction/power reduction technologies change the equation? One thing not addressed by either Pearson or Yoshida’s post is where MAID might come into this argument, as well as the potential combination of MAID and dedupe.

Join the conversation


Send me notifications when other members comment.

Please create a username to comment.

Beth... I responded to the arguments Tony made here: Now I am not saying that tape is good or bad for database backups in that column, just pointing out that Tony's reasoning is a bit specious. Having said that, I would say that we have observed dedup ratios with database to be as good or better than dedup ratios with other data sets. So for everything but deep archival retention, my suspicion is that we will see a fairly convincing swing in the market to disk (with dedup) for db backup.
Beth - you bring up excellent questions, I just got back from an Imation conference where all that was talked about was tape (BY CUSTOMERS: Bank of America, Citigroup, FedEx, JPMorgan Chase, Wachovia, etc.) I once asked a Sun archive sales person what they did when a customer says they want to get rid of tape, his answer was, "We sell them a 100% disk-based solution and then call back 12 months later - THEY usually bring up tape then..." Here is the bottom line that has not been addressed in this debate: 1. The ONLY people who bring this debate up are 100% disk sellers & vendors 2. They would NEVER be so passionate about this debate if tape truly was dead - why would they even spend the time if users were not using tape? 3. If large customers like the ones above truly got rid of ALL tape - they would have to buy A LOT more disk MORE frequently - who would benefit here? The mere fact that disk-only vendors drum the "Tape is Dead" beat shows customers that disk-only vendors have an interest in getting rid of it for their own financial gain (not the customers). Customers are better talking about this issue with companies that sell both disk and tape - if they want more objectivity out of a vendor that is.
Beth, You gotta love the on-going (for the past couple of decades now) “tape is dead” type discussions and debates along with those associated with other “zombie” technologies including “printers are dead as we are in a paperless society” (don’t tell HP shareholders that) or that the “mainframe is dead” (don’t tell IBM customers and shareholders that) or one of the newer ones like “Fibre Channel is dead” as has been heard for 10 years or that the disk drive is now dead (hey its 50 years old) all of which make for good press and debate particular pertaining to storage in an election year. The beauty of “zombie” technologies is that they are not the newest or shiniest and thus don’t have the marketing and PR dollars thrown at them, however, they work, customers buy them, manufactures love them as they work and can sell and make money on them without having to spend lots of marketing dollars around them. The general notion that a VTL + Dedupe would draw less power than a traditional tape library is laughable and hence the need for a good screen protector privacy filter like those from 3M among others. However, as is often the case, say you have a tape library with 20 tape drives that are generally in use, and then compare to a small dedupe VTL with say only 12 SATA drives, sure its possible with some creative configuration to make a VTL + dedupe draw less power than a tape library however is it an apples to apples comparison, hardly not! Take a look at a given raw capacity size and make a baseline comparison for power, cooling, floor-space, environmental (PCFE) impact of VTLs, tape libraries, MAID and traditional disk storage systems that not only factor in storage (raw) capacity, also performance when it comes time to store or retrieve data and the multi-dimension picture becomes rather interesting and puts the different tiers of storage more into perspective. For example, check out the industry trends and perspectives report "Energy Savings Without Performance Compromise" at as an example (I need to update and expand the charts to add some additional solutions) of how effective tape libraries can be compared to even MAID and MAID 2.0 solutions with regard to addressing PCFE issues while supporting various service levels including performance, availability, capacity and energy use. Now granted, the de-dupers will cry fowl as I would expect them to in that the baseline approach does not show effective capacity improvements when the de-dupe is applied to their solutions. Ok, fair-enough, however first show the base-line without de-dupe or compression, then, show the same solutions with de-dupe and/or compression applied for an apples to apples, oranges to oranges comparison vs. the more normal mode we see which is apples to oranges in forced mis-match scenarios. Cheers Greg Schulz – and
Beth - I agree with you. As with much of the technology evolution we continue to witness, with newer technologies maneuvering to establish their place in existing data storage hierarchies there is indeed a productive middle ground. In the case of MAID this is a technology that can effectively augment the existing infrastructure and be a complementary addition rather than being positioned as a combative replacement for traditional tape solutions. Responding to Greg - it is not laughable to say that a disk based VTL could have less power and cooling requirements than an enterprise tape library. Assuming of course the VTL is based on MAID - not a compromise that merely spins platters down but one that actually powers drives off. It is interesting to note that a MAID based Virtual Tape Library that dictates the loading and unloading of tape cartridges is directly mapped to a MAID array. As access to Virtual Tape Cartridges is enabled via the “tape load command”, the subsequent “disk spin-up command” is executed on the MAID array. When the access to the tape data has been completed, the “tape unload command” engages the MAID array to “spin down” the disk array. The power savings with MAID array has the same power consumption as tape when the tape data is not being accessed. MAID is the ultimate in power efficiency for on-line storage solutions and matches that of tape. Why would you not include data deduplication in the comparison, that is like saying you should not consider compression when calculating the effective storage capacity of a particular solution. Looking at raw capacity may be an interesting data point, but if a solution supports data deduplication the supportable storage density is significantly increased delivering all the associated economic and operational benefits. The apples to apples comparison should be based on the benefits delivered to the end user and not the nuances of the technology.
>> Bill for an apples to apples comparison, you would want to look at the effective benefits delivered including performance/throughput to ingest as well as re-inflate or restore data (e.g. mean time to restore), effective storage capacity for different workloads and data types including un-compressed or base-line (for data that does not compress well or at all), compressed (for data that does compress) as well as de-duped (for recurring data that is de-dupable) or a combination. Certainly than you would also want to look at the power, cooling, floor-space, environmental (PCFE) impact including total power required at startup, normal running, low power or powered down modes for circuit sizing of not only the back-end storage system, as well as the de-dupe or compression appliances and their associated buffer storage if applicable or as needed by some solutions. Assuming that a common de-dupe engine (software and appliance) is used across different "back-end" storage systems and that the appliance plus software are not a bottleneck, than you should see consistent in theory effective improvements on storage capacity across different "back-end" storage systems. Then the discussion would turn back to how does the "back-end" storage systems differ, for example your MAID array powers down to avoid power, however what is the performance difference when data needs to be restored and accessed? Likewise as part of the PCFE discussion and comparison, also consider what if any additional storage is needed as part of a buffer to support de-dupe processing and its power consumption, or, can the de-dupe appliance use the "back-end" storage natively to help reduce costs without performance or PCFE impact. So the apples to apples should be for base-line as well as for enhanced data protection, data footprint reduction using both compression and de-duping across different data types and workloads, factoring not only capacity as well as performance for on-line vs. on-line, off-line vs. off-line to meet different needs. Cheers GS
The benefit of MAID is the savings of power consumption by turning off the disk drives and extending the drive service life…the implementation by an application (e.g. dedupe, compress) would be to have the data drives available upon access by the application. The “back-end” storage, in the case of MAID, would provide the savings of power without the penalty of “inaccessibility” to the data. The key access behavior required by the ‘end-user’ is the accessibility to all their data. As data sets (e.g. objects, files, volumes, containers, etc.) and the disk drives capacity grows, the ability to retrieve the data is relative to the ability to access (r/w) the data and the capacity of the data system. For instance, the ability to retrieve data (all or some) from a 500GB SATA drive is at least a factor of 2x or more when retrieving the data from 1000GB SATA drive. This is assuming that the spindle speed is the same in each of the drives. A disk drive, when requested to read or write data, will need to prioritize the heads and actuator to successfully perform the request. In this case, the disk drive is managing the queue of requests, the location on heads and the r/w circuitry. Thus the result is milliseconds of delay for each request. A MAID array manages the access to the data on the set of “active/inactive” drives over a set of data that is a 1000 times larger than a single disk drive. The result of an access to data in a MAID Array is managed based upon the availability of the active drives. In some cases the disk drives are active which will result in the request queued to the drive. In the case that the drives were in a “power savings” mode, the drives are activated and the requests is queued to the drive. MAID provides for the “maximum capacity” with “peak access” to the large sets of data (PBs, EBs, etc.). CTS
How does Quantum's product portfolio stack up against its competitors when we look towards the future? Are they selling legacy equipment??