In this Chicago Storage Decisions 2012 TechTalk interview, cloud storage expert and Dragon Slayer Consulting founder Marc Staimer answers some of the most frequently asked questions about building private cloud storage. Topics discussed include the advantages of building a private cloud, characteristics of the cloud, required software, using storage area network (SAN) and network attached storage (NAS) infrastructure, common misconceptions about private storage clouds and applications that are a good fit for private cloud storage.
SearchStorage.com: Do you physically build a private cloud or is it more of a change in the way you approach storage?
Staimer: Well, it's not like lifting weights. You're not building your body, so to speak. But yes, it's similar to building a regular storage system. You need components, which [are] software and hardware. Sometimes the software comes bundled in hardware shrink-wrap. But the hardware is primarily x86 servers with embedded storage.
You need [network interface cards] NICs and Ethernet switches, and that's pretty much all the components you need. In some cases, some cloud storage providers will allow you to repurpose [direct-attached storage] DAS or SAN storage [behind it] as well.
SearchStorage.com: What are the advantages of building a private cloud?
Staimer: [There are a] huge number of advantages. Traditional storage -- whether it be file-based storage or SAN-based storage, block-based storage -- [isn't] really designed for the massive amounts of passive data we're generating and creating.
Passive data is data you don't access very often. In most organizations, about 90% of the data is passive and 10% is active, but we put [passive data] on systems designed for active data. Cloud storage is designed for passive data. Now, when we save passive data, it's going to be kept for a long period of time and at a low cost -- low cost meaning a much, much lower cost than on traditional storage.
Why keep [data] on expensive storage if we don't value it that much in the sense of accessing it that often? [Cloud storage] solves a cost problem. It solves a data permanence problem. It solves a resilience problem. It solves a geographically dispersed problem.
If you want to provide workflow sharing, collaboration and things of that nature, cloud storage is ideal for that. So, those are some of the reasons why you would use cloud storage.
SearchStorage.com: What are the characteristics necessary to make something considered "cloud"?
Staimer: Well, first, cloud and cloud storage aren't the same thing. You have cloud computing, cloud applications and you have cloud storage. A cloud application would be something like iTunes, Salesforce.com, Google Docs or Office 365. Basically, you're accessing an application, and the application happens to have storage behind it. That's storage in the cloud.
Cloud storage has to have different characteristics because you're accessing the storage directly over the Internet. Whether it's a private intranet or a public Internet, VPN or whatever, you're still accessing it over the Internet.
Because you're accessing it over the Internet, it has to have different characteristics. The interface is going to be REST or SOAP. You can put REST or SOAP on almost any storage system, but the key issue is does it have a pay-by-the-drink paradigm? In other words, you pay for what you consume in arrears vs. pay for what you might consume. That includes hardware and software. Again, cloud storage has a different pay paradigm.
In addition, you have to have that resilience capability. You have to have that ability to store things for a very long period of time, which means you need to have certain types of technology you don't typically find in traditional storage, such as erasure codes or multicopy mirroring.
But the key factor is that it has to be geographically dispersed. [Cloud storage] has to know where it is. It has to know the different locations. If you set policies that a user needs to get the lowest possible response time, then [the technology] moves the data closer to the user.
It's a much smarter form of intelligent storage than traditional storage because it's based on object storage. So, [it's] not that you can't have cloud storage that isn't object storage, it just doesn't really do everything cloud storage should.
SearchStorage.com: What kind of software is required to build a private cloud?
Staimer: It has to have basically all the characteristics of the cloud. In other words, cloud storage is all based on the software. The hardware, as I said, typically is x86 and internal drives, SATA, SAS, nearline SAS, [solid-state drives] SSDs.
But it's the software that makes a difference. So software from [companies such as] Amplidata, Cleversafe, Scality, Caringo, Dell, HP, OpenStack, Nirvanix, EMC Atmos and IBM -- which is basically a resale of Nirvanix -- but you have a variety. [There are] lots of different opportunities, but the software itself is the cloud storage software that just runs on vanilla x86 servers.
SearchStorage.com: Is object storage essential for building a private storage cloud?
Staimer: It's not essential, but it's hard to [build a private cloud] without it -- especially cost-effectively. Not that you can't, but you can't necessarily provide all the characteristics you get from object storage. Object storage, by definition, means the data doesn't have to be laid across a consistent storage system -- it just has to be consistent with the rules of the data.
That means the data can be presented in different nodes, different systems [and] in different locations. [Object storage] is aware because you have a lot more metadata. Any other form of storage doesn't have the level of metadata or intelligence to do some of the things cloud storage needs to do to meet the needs of the market.
SearchStorage.com: What role does open source software play in building a private cloud?
Staimer: Well, there's one called OpenStack. The part of the storage piece of OpenStack is called Swift. It's a work in progress. Some of the players are actually going to market with it, like Rackspace, HP and Dell. They all have OpenStack storage projects going, Swift projects.
HP has been in private beta for about a year; now they're going to public beta and moving into a generally available product soon. They make up for some of the shortcomings on that software -- there are a lot of shortcomings on OpenStack, [but] it's getting better -- by putting storage systems that are cloud-integrated with it, like Panzura. That's one of the things HP does.
For example, OpenStack has a limit of a 5 GB file. Panzura doesn't. So, if you have large files, it just breaks it into 5 GB chunks for HP. There are ways to get around it. But OpenStack, which is the open source, is playing a role. It's got about two to three years before it catches up with the industry in general, and the industry in general keeps advancing, so the bar keeps moving out some.
SearchStorage.com: Can you turn your data center’s SAN or NAS into a private cloud?
Staimer: Yes, you can. Not everybody can do it. You need to have the right vendor partner. Nirvanix can do it right now, because that's how they've architected their cloud to work. They can front a NAS or SAN behind it. Most of them don't. It doesn't mean they can't. They don't because it's a support issue.
SearchStorage.com: If you can’t, is it a sales issue?
Staimer: Yes and no. It's more of a support issue than a sales issue because there isn't a lot of margin in disk drives embedded in x86 servers. So, it comes down to support issues more than anything else. It's much more costly to do it.
SearchStorage.com: Does your infrastructure have to be highly virtualized to build a private cloud?
Staimer: It does not. It can be totally physical, not virtualized at all.
SearchStorage.com: What are some of the misconceptions about building a private cloud?
Staimer: The biggest misconception about building one is that it's hard -- [but] it's actually very easy. The other big misconception is you can't build performance into storage clouds. You can. It depends on the vendor; take Scality, for example. They can provide response times in their storage cloud equivalent to a SAN. In fact, they've replaced a number of SANs for things like a Web-based email from some of the cable providers and some of the ISPs.
Another misconception is that if you build a private cloud, you can't work with a public cloud. Well, that's not true either. The de facto interface in the public cloud is the Amazon S3 interface, because Amazon is the largest public cloud service provider in the world, with like 1.3 to 1.5 trillion files in its storage cloud. So, interfacing isn't that bad. There's a new standard coming out called [Cloud Data Management Interface] CDMI that's being pushed by SNIA and a number of vendors have adopted it.
SearchStorage.com: Are there certain applications that can benefit from running on a private storage cloud?
Staimer: Archive, backup, any kind of data protection, e-discovery, email archive -- any of those works really well. Again, it's a lot of passive data. Some of the cloud service providers now have a Hadoop interface, [Hadoop Distributed File System] HDFS. It can write directly into the cloud storage and use the cloud storage.
Architecturally, they're very similar -- multiple nodes spread across lots of different nodes -- so they work well together. Pretty much most of them will have a Hadoop interface within the next 12 months.
Get yourself prepared for the future by adding DevOps skills to your experience.
- Sample DevOps interview questions and answers (PDF)
- Tough Jenkins CI interview questions and answers (PDF)
- Learn basic Jenkins skills with these examples and tutorials
- Master version control with Git and GitHub
- Keep tabs on continuous code quality with SonarQube
- Properly manage and maintain code with Artifactory and Nexus
- Develop RESTful microservices in Java with Eclipse and Spring Boot