BACKGROUND IMAGE: iSTOCK/GETTY IMAGES
Two of the most popular cloud-based storage services that enterprise IT shops increasingly compare and consider are Amazon Web Services Inc.'s Simple Storage Service and Microsoft's Windows Azure Storage.
Both Amazon Simple Storage Service (S3), which launched in 2006, and Windows Azure Storage, generally available since 2010, claim to store more than 1 trillion objects and handle peaks of more than 500,000 requests per second. But potential customers evaluating Amazon S3 vs. Azure Storage have a number of distinctions to sort out, including cloud storage pricing, gateways and integration options.
In this podcast interview with SearchStorage's Carol Sliwa, Marc Staimer, president of Dragon Slayer Consulting in Beaverton, Ore., offers practical advice on comparing Amazon S3 vs. Azure Storage with enterprise use cases in mind.
[Storage Infrastructure as a Service provider] Nasuni [Corp.] recently released a State of Cloud Storage report, and in it, Nasuni claimed that Windows Azure blob storage beat last year's top-ranked Amazon S3 based on the criteria of performance, availability and scalability. What's your take on the study?
Marc Staimer: My take on the study is Microsoft's probably happy with it, and Nasuni's probably happy with it. It's very self-serving for Nasuni. It's not a standard test. In fact, it's not a test I would personally recommend. It's designed for Nasuni, and it's based on Nasuni products. So, it's not one that I would take to the bank. It lost credibility with me when I saw how it ranked one of the other players in the list, which I have seen more standardized tests show them scoring better than Microsoft or Amazon. So, it's not something I would call empirical or standardized.
Amazon talks objects and buckets, and Microsoft has blobs, tables, queues and drives. We hear a lot about REST [Representational State Transfer] APIs. What does the average IT shop need to know to get started with Amazon S3 and Windows Azure Storage?
Staimer: REST, for those of you who don't know what it is, it's really programmatic HTTP. So, what you do on your computer when you surf the Web, when you type in a 'www' address -- which is a domain name [that] is converted into actual IP addresses -- that's basically what you'd be doing with REST. It's just programmatic. So, REST is kind of a standard, although the industry standard is CDMI [Cloud Data Management Interface]. CDMI was standardized by SNIA [the Storage Networking Industry Association]. Neither Amazon nor Microsoft currently support CDMI. The de facto standard is Amazon's, which happens to be the S3 API interface.
But candidly, from a user point of view, if you want to work natively from your application to the storage service, you will have to make some programmatic changes of your applications so they can talk either to the API or to the REST interface. They're not major changes. They're not rocket science. But they still have to be made.
The alternative is to use a gateway. There are gateways that come from both Amazon and Microsoft for their services, or there are third-party gateways. Now, they're not really gateways when they're from the third party, and in fact, even Microsoft's is not really what I would call a gateway. So, it's more of a cloud-integrated storage, and that's a term that was coined by ESG, Enterprise Strategy Group. [Cloud-integrated storage] primarily [has] the performance of primary storage, so it's not like a temporary-type product or another product. You can use it as if it were local storage.
You mentioned gateways and cloud-integrated storage. Microsoft acquired StorSimple's appliances, Amazon offers its AWS Storage Gateway virtual appliance and third-party products support both services. How do the gateway and cloud-integrated storage options for S3 and Azure stack up against each other? Do you think a gateway is essential to use the two services?
Staimer: Let me answer the last part of that question first. It's not essential, but it has huge advantages. I'm going to start with the AWS Storage Gateway, which is software that you download. They charge for it on a monthly basis, as soon as you start using it, and it is pretty much a gateway. It runs as a virtual machine. You have two modes for it. You can use it as a caching volume. Caching volumes basically say, 'I'm going to store most of my data on the S3 service,' and 'I'm going to store some of my data,' which is the hot data or the data I need to have for local response time, [and] 'I'm going to keep that locally.' So, that's their Gateway-Cached Volumes.
They have a second iteration of that, which they call Gateway-Stored Volumes. Gateway-Stored Volumes say, 'I need to keep all that data local, and I'm just keeping a copy of it in the cloud.' It allows you to do snapshots and move it to the cloud. It's very limited [capacity] on Gateway-Stored Volumes. They top out at 12 TB, and for cached volumes, it's basically 150 TB because you're not keeping 150 TB locally. They keep those volumes as iSCSI volumes, using either DAS [direct-attached storage] or whatever your storage infrastructure might be, connected to your virtual server. That kind of gateway is somewhat limited because they're a service provider. It kind of makes sense. They don't do anything to reduce the amount of data stored on the servers, and its performance is going to be mediocre.
Now, the Azure gateway -- it's not really a gateway; as I said, it's cloud-integrated storage -- has higher performance. So, it acts like a good shared storage system on a SAN, again iSCSI. It's well integrated with some of their services, like SharePoint and SQL Server and Exchange, so that has a tighter integration on the application level. The second thing that they do is they actually do some data reduction. So, they will reduce the amount of data that's physically stored on their service. It will dedupe and compress and encrypt in both the gateway from Amazon and the StorSimple cloud-integrated storage, but they will actually dedupe and compress before they send it to the Azure service so you're actually consuming less cloud storage.
It, too, is somewhat limited in the actual amount of physical storage it comes with. It tops out at 20 TB, but again, if you're using it as a caching appliance, you have virtually unlimited storage on the Azure side of it. You can have more than one on site, and it will look and act and feel like you're really dealing with one storage system, because it's all putting it in the same place. And you can still access the data on one from the other by bringing it back down from the cloud. So, it's a pretty straightforward methodology in both cases.
Gateways or cloud-integrated storage make life a little simpler as far as putting data on the cloud or taking it off, because as far as you're concerned, it's all local. And, ultimately, you don't have to change any of your programs or applications to use it.
Which service holds the advantage on pricing, both in terms of cost and level of complexity?
Staimer: Complexity, I would have to say, Microsoft is less complex. Cost, it depends what you end up negotiating. On a list-price basis, they're relatively close, but Amazon has continued to be the cost leader as far as lowering prices. What you end up paying will vary. It depends on how often you bring data out of the cloud, as well as put it into the cloud. There's a whole mix of things. They have a truly flexible service as far as more services than just S3, where you can put your data. It comes into how much redundancy you want, what kind of data protection you want, where you want to keep the data stored, whether or not you're going to access it at all. If you want to put cold data up there -- when I mean 'cold,' you don't really access it -- they have a service [called Amazon Glacier] for that.
Microsoft, on the other hand, just doesn't have as many different services at this point, although I expect that to change. Their costing tends to be a little flatter. It can end up costing more, but with that cloud-integrated storage, you can be using less. So, there's no consistent answer here. It depends on what you're doing and what services you're using and what data protection you need and what your requirements are to determine which is actually more expensive or less expensive. It's not a direct apples-to-apples comparison.
In the final analysis, what's your overall recommendation on Amazon S3 vs. Azure Storage?
Staimer: If you are a pure Windows shop or are primarily a Windows shop, you're going to find feature functions and assets and capabilities integrating in with Azure you won't find with Amazon, and so, that'll feel more like your environment and will feel [like] an easier transition. And we're not just talking the storage side, but actually the cloud compute side as well. From that perspective, you might find [Azure] more comfortable; not that it's better, just more comfortable. And there are some things that you can do, as I said, with SharePoint integration and Exchange integration and SQL integration that you're not going to find on Amazon, either on the AWS or the S3 services.
On the other hand, if Windows is just part of your shop and you have a lot of other things, and you've got a lot of Linux and you've got a lot of Unix, you might find AWS and Amazon S3 a little bit more comfortable.
Essentially, comparing two services is very, very difficult because it comes down to a matter of taste. It comes down to a matter of your perspective. There is no empirical comparison that says this one is that much better than that one. Yes, I know Nasuni tried to do that, but again, it was on a very specific test written for their stuff.
So, one of the things you can do is try both services with a small project and see which one you like best, and that's always a good thing to do with any type of technology. But it's very, very difficult to say, 'This one is empirically better than that one,' or 'That one is empirically better than this one.' That's why I look at it and say, 'It's a matter of taste. It's a matter of judgment on your part.' Each individual has got to make that decision for [himself or herself]. It's not something that someone, an analyst or an expert, is going to say, 'Well, this is the one I would go for versus that one,' because they're too different.
Amazon S3 API for cloud storage leads pack, for now
Microsoft's Windows Azure network is a massive, virtual SDN