This content is part of the Essential Guide: The case for cloud storage: Cloud considerations and strategies
Evaluate Weigh the pros and cons of technologies, products and projects you are considering.

Weigh private vs. public cloud for cold data storage

The private cloud versus public cloud decision for cold data storage hinges on data amounts, length of storage and access needs, consultant advises.

Private clouds and public clouds are two of the most cost-effective options for the cold storage of inactive data that organizations rarely if ever expect to access.

The decision hinges on such factors as the amount of cold data at issue, the length of time the data needs to be kept and the number of times the information may need to be accessed, according to Marc Staimer, president and founder of Dragon Slayer Consulting in Beaverton, Oregon.

In this podcast interview, Staimer supplies the questions that an enterprise needs to ask when doing its evaluation of private versus public cloud, the hidden costs associated with public cloud storage, and the merits of object storage with erasure coding and linear tape-open (LTO) tape with the Linear Tape File System (LTFS) for cold data storage.

What issues does an enterprise need to consider when weighing public cloud versus private cloud for cold data?

Marc Staimer: You have to take into account a number of factors, such as how long are you going to keep that data. What technology changes are going to occur in that time frame? What is your typical tech refresh?

If you look at some of the technologies that people use for long-term cold storage, such as tape or optical, periodically you've got to change out your technology and rewrite your data or you're not going to be able to read your data over the long term because the equipment will break down, you won't be able to get it repaired, etc.

So, it comes down to how long you intend to keep the data, what kind of technology you're using, and whether or not that technology will be available or capable of reading the data down the road. If you're keeping data for 30 years, 40 years, 100 years, you've got to be planning for a technology refresh if you're doing it in a private way. If you're looking at it from a public way, then you want to make sure the public provider is viable enough to be there in 30, 40, 50, 100 years. In that case, you don't have to worry about the technology. And so you're paying for it month after month, for the data to be stored.

Typically, you have to recognize that cold storage is a type of storage where you're not accessing it very often -- maybe once every couple of years. So, the factors in cost come into play here. How are they charging you as a public service provider? Is it going to be online, or how long will it take you to get it from offline to online? There are times if you put pencil to paper and do the math [that] it's going to be less expensive to run with the public service provider than it is to do your own.

We see costs advertised of a penny per gigabyte per month for cold data with public cloud storage. But what does an enterprise need to factor in when they try to compute the actual cost for that cold storage?

Staimer: Are they charging you to store it and access it or just store it? If there's a combination of storing and accessing, you need to determine how often you will access the data and how much of that data will you access. I'll give you an example. Amazon Glacier charges a penny per gigabyte per month to store the data, and you can recover [a certain percentage] of the data per day at no cost. So, if you're going to recover or view less than that in a given day, there is no additional cost to take into consideration. If, on the other hand, it's going to be more than that -- and you've got to calculate how often you're going to do it [and] what amount of data you're going to be recovering or viewing -- then that's got to be added into that price point. Other vendors charge one fee to store it and no additional fees to read it.

What can an IT organization do to keep the cost of private cloud storage comparable to or lower than the cost of public cloud storage such as Amazon Glacier?

Staimer: It comes down to technology. Technology's going to change. The biggest example of this is again in the tape and optical fields. I'm going to deal with tape first. Take LTO. LTO is considered the standard bearer for tape, and it's a great technology. But typical tape drives will only read backwards compatible two more generations. So, if you're at LTO 6 today, you can read LTO 4 tapes, but don't count on reading LTO 3, LTO 2 or LTO 1 tapes. That's changing in that technology because now there is something called LTFS, which is Linear Tape File System. And LTFS sits in front of the drive. Therefore, your access to the data on the tapes in those drives -- and let's say they're on maybe a tape library behind it -- is no longer directly [tied] to the drive. That doesn't mean you don't need to upgrade those drives over time, but this makes it a much simpler process. You could hold onto technology longer and therefore reduce costs.

In something like object storage using spinning drives, where again as technology changes over time, it's totally transparent to the data so you don't have those costs of actually moving data [and] copying data over and over and over again. It never has to be copied again even if you're adding nodes with new drives, or taking out old drives or drives that are failing. It doesn't matter.

So, it depends on the technology that you're utilizing. If you're using object storage and erasure coding, let's say, you're in really good shape there. If you're using LTFS, you're also in pretty good shape. Optical? There isn't anything there yet. And pretty much down the road, you're going to see media along the lines of flash that has a price point below that of spinning drives and a higher density. We're looking at a whole change of different kinds of technologies and media. Ultimately, it's going to come down to the software that sits in front of it that will keep your costs down.

What types of storage or types of media do you think are especially helpful in building out a cheap private cloud infrastructure for cold storage if a company doesn't have an army of engineers like Facebook, Amazon or Google? Are there specific products or tool sets or skill sets that they need?

Staimer: I'm a very big fan of object storage with erasure coding. I think that's an exceptional technology for keeping your data for very long periods of time, including many, many decades, and keeping it viable to be read even though the underlying technology behind it may fail over time. The data still stays resilient and durable. I like that.

I see something similar occurring, although not quite at that level, with LTFS on the tape side. So, I've become more of a fan of that in tape as well. And in both cases, you don't need huge levels of expertise. So much of that is built into the software, and either of those technologies -- depending on your goals and what you're most familiar with -- will probably suit [your] needs in a private environment.

Facebook has talked a lot about cold storage, and they started the Open Compute Project with the goal of building out infrastructures at the lowest possible cost. Is there anything the average IT organization can take away from the Facebook/Open Compute Project efforts?

Staimer: What Facebook's really aiming at is to come up with a cost-effective way to do [cold storage] on flash. They'd like to move away from spinning disks. That's part of that Open Compute organization. I expect the media will be there some time in the next 12 to 18 months. Having said that, let me make this clear. It's not the media that matters. It's the software in front of it. If I'm using object storage and I'm using flash underneath, I can pretty much count on the durability. If I'm just using, let's say, flash, flash is not the same kind of storage medium as tape or disk. It's not magnetic. It's electrical. It's capturing electrons. Eventually those electrons leak out, so you have to have some kind of software besides error detection and correction that makes sure the data you're storing on flash stays resilient and durable for long periods of time. So, although I could see it being capable on the media side of being at the cost points that someone like Facebook is looking for, it's still going to require something like object storage in front of that to provide the durability they require.

The Open Compute organization, I think, is a great idea to get more and more vendors working together to solve these very difficult, intractable problems. But, at the same time, you have to look at what software is going to go in front of the hardware.

In the final analysis, what's your top piece of advice for enterprises weighing whether to put their cold data in public or private cloud storage?

Staimer: First and foremost, you have to determine how much data you're going to store over what period of time. Granted, it's a difficult thing to do, and you're going to make lots of suppositions that may or may not be correct. But, still, you have to start somewhere.

Second, you've got to determine how frequently you're going to access that data.

Third, you've got to determine where you're going to keep it -- whether or not you have the physical space to keep that data, including the density gains we're seeing on an ongoing basis in storage, the power, the cooling, the infrastructure if you're going to do it by yourself, and the people, and if the technology will deliver what you require.

You need to compare that to what's available from the public cloud providers and what will be the most cost effective and convenient -- because convenience is a big factor -- in providing that cold storage. I think you're going to find as you do this that private and public are both viable. It just depends on which pieces of the puzzle are most important to you.

Dig Deeper on Nearline storage

Join the conversation

1 comment

Send me notifications when other members comment.

Please create a username to comment.

Glacier can end up being pretty expensive to retrieve data from if you don't understand the fine print, according to one user.