ClearSky Data in August launched its tiered storage as a service software platform, which targets enterprises with...
at least 100 TB of primary data center storage. The company was founded by IT industry veterans Ellen Rubin, who serves as CEO, and CTO Lazarus Vekiarides, formerly the executive director of software engineering for Dell's EqualLogic Storage Engineering Group.
ClearSky Data's managed service lets companies run primary storage workloads in the cloud. The company calls its tiered approach a global storage network, which consists of physical, on-site flash storage hardware that replicates data to its point-of-presence (POP) data centers, and then replicates data from the POPs to the cloud as a logical bucket of object storage in Amazon Simple Storage Service (S3). The vendor operates POPs in Boston, Philadelphia and Las Vegas, with plans to seek additional financing to expand to other metropolitan areas.
Rubin was founder of cloud migration software startup CloudSwitch -- sold to Verizon -- and real-time personalization software company Manna. She also served as a vice president at data warehouse appliance maker Netezza, now part of IBM. SearchCloudStorage sat down with Rubin to discuss ClearSky Data's plans to get customers to embrace the public cloud for storage, how the enterprise storage market is ripe for an overhaul and ClearSky Data's rollout strategy.
You've spent your career working for tech companies, but not in storage. Why did you get into enterprise storage with ClearSky Data?
Ellen Rubin: We feel it's time for storage to catch up with the rest of the infrastructure stack. Enterprise IT doesn't much use the global cloud. But when you stand up storage, traditionally, you buy storage arrays, hook them up, and do all the backup and disaster recovery to a secondary site. We think it's time for that model to adapt to what customers want, which is that computing workloads could be anywhere.
What are the components of the ClearSky Data service?
Rubin: We offer a global storage network, somewhat modeled on Akamai's content distribution network. The idea is that we will have metro-based points of presence in every major U.S. city. We're planning to cover North America within our first year.
Our intellectual property is around a set of distributed set of caches that moves data around. We use commodity hardware, but the brains of the service is our software, which knows which blocks of data should go where and tiers it in an optimized way, with deduplication, compression and WAN optimization built into the service.
We offer an SLA-based service that guarantees five nines of availability and a comprehensive security model where we encrypt everything. The customer always has control. Our managed service can deliver hundreds of thousands of IOPS at very, very low latency. Any traditional enterprise application will still run as if the storage is sitting locally in a traditional storage array. We do it without the customer having to manage the infrastructure, and we are one-third the cost of traditional storage.
How does ClearSky's storage tiering work?
Rubin: The data a customer needs to run its business will be kept in local hot cache that uses flash for high performance. Data in warm cache at our POP is heavily in flash and available to customers within two milliseconds. We have a little bit of hot disk in the cloud for holding on to some data for a longer period of time.
That's why we put so much work into storage networking. On the back end, we use public clouds like Amazon S3 for data that is unlikely to be accessed anytime soon. We write locally and do write-back cache to the public cloud, which is where our data protection takes place. We write all data back to the cloud, so customers always have access to it.
What type of flash device do you place on the customer's site?
Rubin: It's a 2U Intel-based commodity device with redundant everything. It's a compute-heavy box that we use to figure out where data gets tiered and to handle all data encryption. We do this all at wire speed and we manage it, so the customer never touches it.
Ellen RubinCEO, ClearSky Data
They scale up and scale out. The initial footprint is a single chassis that's redundant all the way through. The initial footprint holds up to about 10 TB of [data in] the hot cache. That cache could be fronting a petabyte of storage all the way on the back end. We use a very small footprint of flash for data that needs it all the time. As that block of data becomes cooler, it will be evicted from the front-end cache, but still be cached in the warm tier in our POP.
How are you pricing your service?
Rubin: The customer has to commit to a minimum of 20 TB of capacity for at least a year. We price it like a tiered model. You would have a per-gigabyte price per month, depending on how much capacity you have with us -- including both primary data, as well as snapshot data.
How does ClearSky Data handle storage security?
Rubin: The first version of the service built in full AES 256 encryption for both data in transit and data at rest. We handle the key management on the front-end edge appliance that is part of the service. We provision that and install it, but it sits in the customer's data center and they control it. That's basic table stakes.
We also encrypt data at the transport layer. We put network connectivity in as part of the service. It's a network of private lines, so it's not connecting over the Internet, but we encrypt it anyway. Then, we isolate the customer's environment using containers, so that even though it's shared infrastructure, each customer is isolated logically within the point of presence, which is a multi-tenant service. The first version takes advantage of Amazon's Virtual Private Cloud security, so customers will have end-to-end protection all the way through to the cloud.
How will ClearSky provide its regional data centers?
Rubin: We are not building data centers or building hardware. We have a partnership with Digital Realty. We take a small footprint in a single cabinet in a colocation facility, which has enterprise-class physical security and compliance. Digital Realty is in every major city in the world and has dozens to hundreds of carriers connecting to the cloud on the back end. Our goal is to have a POP within 120 miles of any customer.
What is the storage profile of your target customer?
Rubin: We use about a dozen criteria to qualify whether we make sense for a customer. The customer needs to be in the metro area where we have a POP, or are about to have a POP. They need to [have storage] in a data center, which rules out the [small and medium-sized business] SMB market. Typically, they would have at least 100 TB of total storage. So, there's a certain size and shape that we're looking for. The other thing that's very important -- and it's kind of funny -- is: Have they ever put anything outside their firewall at all? Have they ever put data in [Amazon] EC2? If the answer is no, they're probably not going to consider a managed service.
How will ClearSky Data stand out among all the noise around cloud storage?
Rubin: We think we have something disruptive, but we have to educate the market. To us, primary storage is the key thing. In our model, cloud storage and disaster recovery are features, not separate pieces of the infrastructure. We have to pay our way forward and prove to customers that we can provide them with data protection, performance and low latency that we claim. We spend a lot of time with their security teams. When you do enterprise primary storage, you have to get ready for that.
Determining which data to store in the cloud
Toigo: Cloud storage market is a race to the bottom
A closer look at the impact of cloud storage