Cloud storage is a common target for backups and archives, but it's a tricky proposition for primary data.
Marc Staimer, president of Dragon Slayer Consulting, advises IT shops to use hybrid cloud storage or cloud-integrated storage for primary data when applications run at their on-premises data centers. With hybrid cloud storage, often in the form of an appliance, a copy of the most active data is stored on-premises and the less active data is tiered to the cloud. Cloud-integrated storage refers to a cloud gateway product, SAN, network-attached storage (NAS) or a unified storage system that tiers and/or caches data to cloud storage.
The other scenario Staimer recommends for storing primary data in the cloud is running the application with the cloud provider's compute service to improve performance.
In this first of a two-part podcast interview with TechTarget senior writer Carol Sliwa, Staimer identifies guaranteed service-level agreements (SLAs) as the hottest trend in cloud storage for primary data and discusses the ways in which users can mitigate the latency issues and consider cloud storage for their most active data.
For what types of primary data is cloud storage most appropriate these days?
Staimer: When we start talking about primary data in the cloud, it comes down to this concept of latency between the application and its data. That means the storage and the application need to be relatively closely located. So, if the application's in the cloud and the storage is in the cloud with it, that type of storage can be almost any kind of storage. That can be SAN. That can be NAS. It can even be object, because the latency issue is less. Obviously it varies by different kinds of storage systems.
The kind of primary data that you can use with primary storage in the cloud is basically just about anything. It depends on the SLAs. [SLAs] are what a lot of the cloud storage providers live and die by. When they put a primary application or a mission-critical application in their cloud, the users typically want a certain performance level that's measurable. There are certain penalties. Usually it's no more than what they're paying when it's not met. What the [SLA] does is help enforce that, and the service provider typically today has certain capabilities, either in software or in hardware. And there are storage systems that will do this, and this is software for storage systems that will do this that can guarantee the SLAs to a point, depending on the resources available. So, the kind of primary data that you want to have in the cloud is the kind in which you can get the results you're looking for if it weren't in the cloud.
Are you saying that a company should store primary data in the cloud only if the application is also running in the cloud -- in other words, the application is running on the cloud provider's compute service?
Staimer: The performance of the application is tied to the latency of being able to access and write its data, and if you don't have them colocated, you're going to have highly variable performance and not a good performance. If they are colocated, you're going to get the performance that you typically would expect and see in your own data center. That's why you tend to want to colocate them. Can you not colocate them? Sure you can. Just expect a much lower performance level.
Are there cases where primary data could work in a situation without colocation with the application -- in other words, where the application is running in a separate place from the cloud storage?
Staimer: Sort of. It would be an application-to-application type of view. Let me put that in perspective. Let's say you're running a Hadoop cloud in Location A and you have a typical SQL or NoSQL application running in Location B. You could be running the Hadoop data, which is separate from the analytical aspect of the results of the Hadoop, and you can move that data from Point A to Point B and then analyze it at that application. But ultimately, you want to keep the data as close to the application as you can. That is usually a best practice. Anytime the data is not close to the application, you're going to have significant lag, delay, latency, whatever you want to call it, that typically is unacceptable to most users in an active or primary data situation.
What are the hottest technology trends with respect to cloud-integrated storage for primary data?
Staimer: The hottest is guaranteed SLA capabilities. Some storage systems can do it in solid-state, and others can do it with a hybrid. Others do it with software where they're basically robbing Peter to pay Paul so that when there's an SLA issue, when it's falling below the requirement of the application or the user, it will take resources from elsewhere and make sure that that application gets the SLAs it's looking for. SLAs are typically tied to performance, response time, latency, and that is probably the hottest thing going on right now in the service provider [market] for data in the cloud or primary data in the cloud: being able to provide those guaranteed -- and when I say guaranteed, based on automated systems, not human beings -- SLAs.
For primary data, in what ways have your recommendations changed over the last few years on public cloud, private cloud, hybrid cloud or cloud-integrated storage?
Staimer: Primary data almost invariably requires a cloud-integrated storage play today. If you're going to use a cloud architecture, you need something that's going to give you the performance locally that you expect, whether that be on solid-state, whether that be on spinning disk, whether it be a combination. Whatever it is, you're going to need that locally, which means that the storage system itself needs to have the ability to tier the data to the cloud, whether it be a private object storage cloud or whether it be a public cloud. It has to be able to tier to it and still give you complete control and meet your security requirements.
So from my point of view, if you're going to do primary data and your application resides locally, you need cloud-integrated storage. If you're going to do primary data in the cloud and your application resides in the cloud, you want to use the storage that's in that cloud. That's how my recommendations have changed.
Read and listen to part 2 of the podcast
- Cloud-based nearline storage options enhanced for big data