Howard Marks is founder and chief scientist at DeepStorage LLC, a storage consultancy and independent test lab. In this expert interview for SearchStorage, Marks tackled seven questions related to using cloud storage, ranging from the value of service-level agreements to determining which applications are truly a good fit at a time when many IT administrators are considering going beyond backup and moving primary data and business applications to the cloud.
Much has been made of cloud storage moving beyond backup to other use cases. Can you name a few examples of other types of data or applications that are well suited for using cloud storage?
Howard Marks: I like using the really low-cost cloud storage services, like Amazon Glacier and Fujifilm's Permivault, for archives. Run an archiving app like Metalogix Archive Manager or Symantec's Enterprise Vault, keep the index in-house for fast queries, and keep the data out in the cloud where it's someone else's problem to maintain.
The second example is using a cloud storage gateway like TwinStrata or Microsoft's StorSimple as primary storage that backs itself up to the cloud. These appliances use local storage, sometimes SSDs [solid-state drives] as a cache; if the cache is large enough, you can keep working even if the Internet connection went down.
But my favorite cloud storage application is the global file system that a network of Nasuni or Panzura appliances creates. Connect a dozen offices with these appliances, and the whole cluster acts like a single file server. Video created in LA can be read and have its common gateway interface applied in San Francisco and viewed by the investors in New York like they were all local files.
Let's say you're ready to start using cloud storage beyond backup. What is the best way to evaluate other data to see if it's a good fit for cloud storage?
Marks: I like to think of it like this: What function do I want the cloud to give me? A place to put data without worrying about how to manage it? Remember that accessing data over the Internet introduces a lot of latency; that makes it appropriate for backups or file sync-and-share, but not for direct access by databases.
As an off-site backup and [DR] disaster recovery solution? With the right gateway, the cloud can act as real-time backup and disaster recovery for just about any application or remote office.
As centralized storage for a global file system? Again, this solution depends on the right gateway -- but the results can be astounding.
Do I have to be using object storage to succeed in placing primary data in the cloud?
Marks: That depends what you mean by using object storage. Do you have to get new applications that write to the S3 [Simple Storage Service], Swift or CDMI [Cloud Data Management Interface] API? No. You can use cloud storage with your existing applications.
You may have to use a gateway of some sort. If your applications are at all latency-sensitive, as all databases would be, make sure the gateway has enough cache to make cache misses rare events.
File sync-and-share offerings are on the rise. Why are these a natural fit for cloud storage?
Marks: In many ways, sync and share is a natural cloud application. First of all, you don't know how popular it's going to be until you turn it on, so [without the cloud] you have to invest in a lot of storage that may not be used for a while. If you use a sync-and-share service like Box, or run your own software and store the data with a public cloud storage service, then you'll only pay for what you use, and your users will be able to get their data from wherever they are. That elasticity is a key advantage of public cloud offerings.
What types of data don't belong in the cloud?
Marks: With most new technologies, we start by recommending that users keep running their mission-critical applications the old way, just in case. Mostly, we're just being careful and limiting our exposure if something should go wrong. The cloud is different; I wouldn't hesitate to recommend Salesforce.com, Outlook.com or other SaaS [Software as a Service] solutions that outsource those mission-critical applications.
I would say that when using cloud storage, with a few gateway-enabled exceptions, it isn't appropriate for most organization's mainline, transactional applications. The latency associated with accessing data across the Internet is just too great, and the cost of truly reliable Internet connections too high, to make cloud storage fast or cheap enough for that.
The cloud can also appear to be much cheaper than more traditional storage media, like tape, for long-term storage, just because the storage providers quote in cents [per GB, per month]; but people forget that 60 cents per GB, per month is $21.60 for three years and $36 for five. If you have lots of data and multiple data centers, you might save money with your own object storage.
We've heard a lot about service-level agreements [SLAs] for cloud providers. But some experts say they don't take into account one aspect that providers can't control -- the WAN -- and that makes them less than reliable. What's your take on cloud SLAs?
Marks: I consider SLAs to be a declaration by the provider of the level of reliability they're going to try to provide. There are just too many factors, including the reliability of your Internet connection, that are outside the ability of the cloud provider. And therefore, they will be excluded from the SLA.
The other problem with SLAs is that they just don't pay enough when there's an outage. Back when we used film, let's say the lab lost the roll of film you shot on your honeymoon in Tahiti. They would give you a new roll of film. Similarly, if your cloud storage provider goes offline for a couple of hours, you may not have to pay the bill for the month.
But just as your Tahitian honeymoon photos are worth a lot more than the roll of film -- and cost a lot more to create -- almost by definition your losses from the outage will be more than the month's fees. If they weren't, then the ROI [return on investment] of your application is so low [that] you might be better off shutting it down.
Then, what can users to do help get the most out of their SLA?
Marks: I think you have to continue to take responsibility for your application's availability. The SLA is just a target the vendors set for their reliability. If you really want to keep your data safe and your application available, you have to add diversity to your use of the cloud.
Get Internet connections from two separate ISPs [Internet service providers] that use different media, say Telco Fibre and cable TV infrastructure, so a failure of one doesn't separate your users from their cloud apps.
For your top-tier applications, the ones you set up with high-speed replication and failover to a DR site, you'd be abdicating your responsibility if you just moved those apps to a single cloud provider's infrastructure or region. For top availability you need to write your data, or run your apps, in two clouds. That may mean local private cloud storage infrastructure with failover to AWS [Amazon Web Services], or simply using Mozy to back up the same folders you snyc and share with SugarSync or DropBox.