News Stay informed about the latest enterprise technology news and product updates.

Behind the scenes at Carbonite's online backup service

Photobucket
Carbonite CEO David Friend in his Boston-area office

“Do-it-yourself” infrastructure is a competitive differentiator among providers of storage services, I’ve learned in conversations with providers over the last two weeks. While not every Web 2.0 service is storage-focused, these discussions make me wonder what the results will be for third-party storage vendors looking to supply prepackaged configurations to service-provider data centers.

Following Carbonite’s lawsuit against its former storage supplier, its competitors such as SpiderOak have pounced on the opportunity to tout their own internal infrastructures in an attempt to lure worried Carbonite customers.

SpiderOak CEO Ethan Oberman told me that SpiderOak assembles its own storage systems out of commodity servers and disk drives, purchasing individual components and assembling them under the company’s proprietary storage clustering software. “We don’t rely on a third party pre-assembled storage system” as Carbonite did with Promise, Oberman said.

Shortly after I posted about Oberman’s statements, Carbonite CEO David Friend invited me to see Carbonite’s infrastructure. I took him up on that last Friday, and it turns out Carbonite’s setup isn’t much different from what SpiderOak described.

Photobucket
The current Carbonite infrastructure – sets of 15 one-terabyte
SATA drives packed into racks of custom Dell equipment

Carbonite has between 10 PB and 12 PB of storage in two data centers in the Boston area. While the vendor is suing Promise for products it deployed several years ago, Carbonite has already completely changed out the Promise storage in favor of a self-integrated system of Dell PowerEdge MD1000 and MD3000 servers packed with 15 one-terabyte SATA disks, configured for RAID 6. Four of these units are attached to each server node that runs the company’s internally written parallel file system.

Photobucket
Carbonite infrastructure detail

SpiderOak’s Oberman said his company assembles the disk drives and RAID controllers internally. Friend said he’s still content to let a third-party vendor assemble the RAID arrays despite the experience with Promise.

“The software is what we worry about,” he said. Promise’s arrays had firmware bugs, he said, something that might not have changed if Carbonite had done more of the hardware assembly. “Even if you buy a disk drive from somewhere, it has firmware in it – we’re not going to get into that kind of stuff,” Friend said.

Carbonite chose Dell to replace Promise based on a discounted price and its willingness to work with Carbonite to design a customized hardware system, according to Friend.

The more I talk to online storage service providers, the more there seems to be a disconnect between what they’re deploying and what storage vendors are marketing in an effort to reach Web 2.0 shops. While new “cloud” storage systems such as EMC’s Atmos and HP’s ExDS are built on industry-standard hardware components, the vendors also supply software to tie those components together.

Friend said he’s learned that a fully prepackged software-hardware system from a third-party vendor won’t fit his business. “Every piece of software we’ve bought along the way has broken,” he said.

But this also may be because Carbonite is an outlier in terms of its workload. “There aren’t a lot of 10 petabyte data centers out there,” Friend said. He estimated some 95% of the processing time in Carbonite’s data center is spent on write, rather than read operations. “There [also] aren’t a lot of data centers out there that are ‘mostly write,'” he added.

Carbonite also designed its parallelized distributed file system to treat data in its data center and on users’ PCs as part of one big geographically distributed pool. Friend claims this is a differentiator, providing speedier restores to users than competitors such as Mozy can do by reassmbling files before restoring data.

For those reasons, Friend said he doesn’t anticipate that online services focused primarily on storing customer data will be fertile ground for existing storage vendors. This hasn’t stopped third-party storage vendors from making regular sales calls to Carbonite’s data center, according to senior director of operations Kai Gray. Gray said he listens to most of the pitches, but he echoed Friend on the issues with prepackaged software, and said the cost comparison equation has yet to change.

“By the time [a storage vendor] puts stuff together and marks it up, it’s too expensive,” he said. Storage product competition in this data center is at the disk-drive level rather than systems. “We’re eagerly awaiting two terabyte disk drive shipments,” Gray said. Right now Carbonite has mostly Western Digital disk drives deployed, but “we are very drive agnostic.”

While Carbonite has yet to go for a third-party “cloud” storage system, Friend also points out it’s a different animal from many other Web 2.0 companies. “Most data centers are a cost center, not the business itself,” he said. “This is our factory – everything has to be customized because it’s a competitive advantage. It’s worth it to spend money designing our own file system, but if you’re, say, Fidelity, you don’t want to do that.”

Photobucket
Carbonite CEO David Friend and director of operations Kai Gray in one of Carbonite’s Boston-area data centers

Digital archiving the next frontier?

The data center I saw was very impressive – it’s in one of the newest facilities in the Boston area, complete with ultrasonic humidifiers and state-of-the-art security. But it’s not too far from Carbonite’s other data center, bringing to mind what ESG founder and blogger Steve Duplessie wrote after Carbonite announced the Promise lawsuit. The analyst cautioned that enterprise users should ask online backup services about things like SLAs and geographic redundancy to distinguish between consumer/prosumer and enterprise services before signing over their backups.

I asked Friend about this. Carbonite sees itself as a consumer/prosumer offering, he said, and does not offer SLAs or redundancy outside the Boston area. “Because we’re offering a backup service, there’s already geographic redundancy between the user’s PC and our data center,” he said. “No one [in our market] seems to want to pay double for a backup of a backup.”

However, “if we get into archiving, where we might have the only copy of a document, geographic redundancy would come into play,” he said. Is Carbonite planning that move? “We’re thinking about it,” he said. “It would be a logical product line extension.”

Join the conversation

4 comments

Send me notifications when other members comment.

Please create a username to comment.

I think that the question is: given that you have a 4-core HT processor (i.e., the i7), is it better to tell workstation that you have 1 processor w/2 cores or 2 procs w/1 core (assuming that you want to run a 2 vCPU VM on the 4 core processor).
Cancel
You gotta be kidding me, who wrote this a 5 year old? You didn't even answer the question. Whats the best way to configure it? If I were to imply the answer by what you wrote I should put that I have 8 cpus and that cannot be right since there is only 1 cpu with 4 cores.... the real question being asked is should you put 4 CPUs and 8 cores? or just 1 CPU with 8 cores? Or 1 CPU with 4 cores (hyperthreading left out of the equation). Whoever got paid for this this article really shouldn't be writing this garbage! give the configuration, not the basic theory which anyone should already know, CPUs are better than cores are better than hyperthreading...no duh!
Cancel
I meant to say 4 CPUs and 2 cores, not 4 and 8...
Cancel
Here is a much better explanation that gives real answers:
“Giving it the real numbers, especially on an i7, shouldn't hurt you. It depends, also, on the type of processor usage you expect to have and the number of virtual machines running concurrently. If you are feeling like maxing out everything at the expense of even the host machine, you would want to go with 1 processor and 8 cores. On my i7 975 Bloomfield, I simply set it to 1 processor and 4 cores, but I have one VM running at a given time, usually.

If I were to know I was running, for instance, 4 VMs at the same time, as you apparently are, I may consider limiting each workstation to 1 processor, 1 core using division. However, even if you set all 4 VMs to use 1 processor and 4 cores, the resource allocation in Workstation would be capable of sorting it out for the most part.

Remember, that using extra processors has been designed for server motherboards that actually support more than one physical processor. You can get away with configuring your VM for multiple processors, for example 2 procs and 2 cores, but it may not be as efficient as just sticking to the basics. It is important that hypervisor has a real reflection of the conditions on the host computer to allocate to the virtual machines.”
http://windowsforum.com/threads/optimal-number-of-processors-for-a-virtual-machine-in-vmware-workstation.74100/#post-251510
Cancel

-ADS BY GOOGLE

SearchDisasterRecovery

SearchDataBackup

SearchConvergedInfrastructure

Close