Cloud file storage pros and cons

Discover advantages and disadvantages of cloud file storage, what vendors are in the market, and what environments work well with file storage in the cloud.

Cloud storage appeals to small- and medium-sized businesses (SMBs) and larger enterprises as a less-expensive alternative to equipping and managing a full-blown storage system. Two possible uses for the technology are cloud file storage and cloud backup. Imagine being able to move years-old PowerPoint presentations and human resource manuals from your local file system into an inexpensive, cloud-based storage system that someone else manages and secures for you.

In this podcast, Jeff Boles, senior analyst and director, validation services at Hopkinton, Mass.-based Taneja Group, discusses the advantages and disadvantages of cloud storage, who the top vendors and cloud storage service providers are, and which environments shouldn't use cloud storage. Read the Q&A transcript below or listen to the podcast on file storage in the cloud. Is the cloud a viable alternative for file storage?

Boles: Yes. There are a couple things about cloud storage that make it really cool and particularly good for file storage. Obviously there are some benefits in this pay-as-you-go-storage instead of having to buy big expensive iron and repeatedly replace your storage just to store a lot of the data that's out there no matter what that data is. When you add management operational overhead to the cost of that storage, even the cheapest storage you can get is still pretty expensive.

Multi-tenant infrastructure operates at a massive scale and where you only pay for what you use can be pretty disruptive from a cost perspective. But beyond pure costs, storing data in the cloud is pretty interesting because it gives you some more versatile access to data, and can elevate protection way beyond what you might be able to do with your own data centers.

Consequently, the cloud is gaining quite a bit of traction and there are a couple places where it is getting attraction in the biggest way -- deep archive, for instance, around file data, backup, data storage behind Web applications. At the same time, we have a ton of vendors innovating and trying to make cloud storage more accessible by making it look like traditional storage in the data center.

We call products that do this 'gateways.' The list includes folks like Cirtas Systems, CTERA Networks, TwinStrata and others, and even Nirvanix provides its own cloud storage gateway.

Products from those vendors make cloud storage look just like iSCSI or NAS storage that sits in your data center today, and can give you an easy on-ramp for getting your data into the cloud, and making use of the cloud even the same way you're using your storage in your data center today. What types of environments work well with cloud-based file storage and which do not?

Boles: Obviously, cloud storage is still at the remote end of a wire and, consequently, there aren't too many folks arguing that it's a perfect place for those I/O-intensive workloads like boot images or databases, for instance. Some of the new gateway vendors are doing quite a bit to take that latency out of the picture, but that doesn't mean you can go out and replace your Symmetrix or V-MAX with the cloud gateway today.

Consequently, the best uses of these technologies are for relatively inactive data that consumes lots of space. This includes longer-term backups, deep file archives and lower priority files. Now you'd think with something like backup, the cloud wouldn't necessarily be the best place. After all, it's a tremendous amount of data to transfer and when you need a backup, you better have it back right that second. But there are a couple of vendors who are both caching and optimizing the cloud connection so that recent data is local, and cloud stored data is highly optimized so there's less data transfer involved. Riverbed Technology and Cirtas Systems are both doing this. In other cases, backup vendors like Symantec or CommVault are integrating the cloud as a tier to which data is migrated during a data protection lifecycle. You can still keep your most recent stuff on-site, but move old stuff to the cloud, or create copies into cloud for DR.

There are a lot of benefits to these technologies, and they warrant consideration for anyone making changes in their data protection infrastructure. All of this innovation is helping the cloud work well for some really interesting use cases. It's not right for primary storage; it's not going to be your primary storage solution in your data centers. Are there features cloud storage services offer that you can't get with local file storage systems?

Boles: I've mentioned a couple of items like this idea of data access from anywhere, which is great for follow-the-sun enterprises and such. But I think one of the biggest things you get from cloud storage vs. local file storage is extreme data protection. If you look at most of the providers out there today, you're talking about multisite replication of data often with at least three copies. Such replication is also built on top of a foundation designed for data integrity. Just think about it. The issues that arise with data integrity at extreme scale are totally different than what you may be capable of in your data center; and the things providers have done to protect themselves against those issues may mean you can find your best data integrity in the cloud.

For instance the Zetta guys like to tout the fact that they do all kinds of profile check summing, periodic data reading and rewriting, and more to protect against things like RAID write holes or just gradually deteriorating disk media. For the customer, what you get is this multisite data protection with extreme data integrity, and all that is delivered on top of the provider being off-site from your facility to begin with.

When you add that to the low pay-as-you-go cost and the ease of use associated with things like extreme instantaneous scalability, deep data protection can multiply the value you get from storing data in the cloud. Why? Because once you have data out in the cloud you can often stop worrying about backing it up or protecting it. What type of back-end infrastructure should storage administrators concerned about performance and reliability look for?

Boles: That's a great question, and it's a pretty tricky question to ask today. The reality is that it's almost impossible to do thorough due diligence on cloud infrastructures. Most of the vendors out there consider much of their technology to be 'secret sauce' behind the scenes, and in many cases they're such a large multi-tenant organization to begin with that you just can't get those types of answers from the vendor. Imagine trying to do due diligence on Amazon's S3 storage while going through their self-service portal.

The cloud needs to be treated like a service provider relationship in the traditional sense. Look for the characteristics you'd expect from a system or business you're going to interface with. Demand an established reputation and realize there may be a premium there worth paying for. Look for third-party opinions and expect to operate on a foundation of contracts and SLAs [service-level agreements]. With a reputable provider, those SLAs are ultimately going to be your real insight into what they themselves think is possible and typical.

That being said, you should look to your provider to be able to have real conversations or to provide publicly available details about what they do in the back end when it comes to protecting your data against bit rot, against catastrophic site loss, against connection outages and more. They'll provide some detail, and the depths of those answers will shed some light on the technical capabilities that exist and how they're differentiated between providers.


