BOSTON -- Two years after acquiring Ocarina Networks, Dell Inc. is still working on integrating Ocarina’s data reduction technology into its storage arrays and servers. Carter George, Dell’s executive director of storage strategy, gave SearchStorage.com a progress report on that integration during an interview at Dell Storage Forum 2012 earlier this month.
Topics discussed included:
1) Why Dell missed the projected 2011 ship date for adding data deduplication capabilities to the Dell Fluid File System.
2) Plans to add dedupe and compression capabilities to block storage through Dell’s Compellent and EqualLogic arrays (Internally code-named “Bob”).
3) An exploratory project, code-named “Rocket,” for dedupe on servers.
4) The importance of adding dedupe to Dell’s upcoming flash-based Fluid Cache in servers.
5) Putting Dell's Fluid File System on top of DX Object Storage.
SearchStorage.com: How far has Dell come in achieving its vision of having deduplication throughout its product lines?
George: Our vision is the same, which is Ocarina everywhere. Every Dell product or storage service will have some level of Ocarina compatibility baked into it, and the reason is everybody is going to be doing data reduction. At the scale of data growth that everybody’s facing, it’s just table stakes.
Where the advantage is going to come is by having compatible dedupe and compression everywhere, so any time you move data from one place to another, you’ll be able to move it in its most efficient form.
Carter George, Dell's executive director of storage strategy
The real benefit of Ocarina everywhere isn’t so much about the data reduction itself. Where the advantage is going to come is by having compatible dedupe and compression everywhere, so any time you move data from one place to another, you’ll be able to move it in its most efficient form.
Where are we with that? We didn’t have any Ocarina product until 2011. The first thing that came out was Ocarina embedded in the DX scalable object store mid-year last year.
At the end of last year, we shipped the DR4000 [backup deduplication appliance], which is interesting because backup wasn’t Ocarina’s thing. Ocarina was for primary storage. But there was a lot of pressure to have a Data Domain-like [appliance] when we broke up with EMC, and a lot of resources were directed to creating that. This particular model is the low end of the Data Domain. We have several new models coming out that fill out that product line if you want the bigger ones.
Now the thing that was a big disappointment was, at that same time, we were supposed to have Ocarina on the Dell Fluid File System. Didn’t happen. Some problems cropped up, and they had to go back to the drawing board and come up with a different way of doing integration. That’s at least a year behind schedule. It was supposed to come out at the very, very end of 2011, and now it’ll come out in early 2013.
SearchStorage.com: Why such a long delay?
George: They found some problems with the way that Ocarina was integrated with the metadata and all that. It wasn’t just a bug or something. It was a fundamental no-no. This wasn’t the right way to do this. So, that’s why it’s such a big slip.
SearchStorage.com: Will they be able to resolve the problems?
George: It has been resolved, and in fact, the new approach is working great. It’s well on its way. It has nothing to do with how Ocarina works. It has to do with how it gets meshed into the actual file system.
It’s one of those things where it’s really easy to get the prototype working for the 80% case, but it’s all the corner cases and failure cases. Getting something that really does work in production, not in a demo, turns out to be hard.
SearchStorage.com: Are the corner cases with specific applications or data types?
George: No, they’re all about if you lost the disks that had the index, and at the same time, this other failure happened. We don’t want to introduce any case where you could lose your data because you turned this on.
SearchStorage.com: So, when dedupe is integrated with the Dell Fluid File System that means customers will be able to use it with any Dell NAS system.
George: Yeah, it will come with all the NAS systems. It’s FS7600 for EqualLogic, the FS8600 for Compellent and the NX3600 for PowerVault. Those are the hardware products you buy in order to get our file system.
SearchStorage.com: Where else will Dell customers see dedupe?
George: DX is done. The cloud is based on DX, so that’ll get Ocarina in the cloud when the cloud comes out. My impression is that that will be available before the end of the year. They’re going to have a storage-as-a-service offering based on that object store, and they just get dedupe as part of that.
SearchStorage.com: What’s left?
George: EqualLogic and Compellent for block. Both areas have some fundamental work to do to be able to put dedupe and compression in, and it has to do with how they manage pages. When you’re writing blocks to an array, you’re writing 4K chunks. But arrays don’t usually store those separately as 4K chunks. They put 'em in pages. A page is just an internal construct. Every vendor has different ways they do this.
The first thing that has to happen is you have to have a way to make your fundamental unit of storage variable size instead of fixed. That’s a big job, so they’re actually plumbing in, in EqualLogic and Compellent, a new object layer at the bottom of the array. Instead of a page, what you write out is an object, and that object is variable size. Internally, that’s called Bob, for block-on-object. It’s not a commercial object store. We’re not going to sell it as an object store. The EqualLogic guys built Bob, but Compellent is going to use Bob.
Nothing’s coming out in 2012, but in 2013, you’ll see Ocarina for Compellent first and then EqualLogic. The way EqualLogic does things is we have a major firmware release about once a year, and if you finish something a week after deadline, you have to wait a year to get the next train.
SearchStorage.com: What about dedupe on servers?
George: There is a project called Rocket, and Rocket was an exploration of dedupe on servers. We built a prototype. There were some various and sundry hang-ups. Dell isn’t really set up to sell software. The other hang-up was that there wasn’t any good way to do it for VMware, since such a high percentage of our servers now are VMware. We could do it for Windows as a filter driver. We could do it for Linux. VMware is pretty closed. There’s no such thing as a filter driver in VMware. There are no APIs for getting into it. So, I think the decision was made to hold off and see if we could come up with some way to solve that.
SearchStorage.com: Any other places customers can expect to see dedupe?
George: I think it’s going to make a lot of sense to put it in the Fluid Cache, the RNA-based stuff. Anyplace where there’s flash, there’s a big win from doing in-band dedupe. Flash is organized as cells, and any given cell only can be written to so many times. What’s more, every time you write to the same cell, you’re shortening the life of that cell. Everybody knows that. What fewer people know is that you’re also making it slower. The more you use a given cell in flash, the slower it becomes.
So, if you can avoid doing writes, that’s good. This is why Pure Storage, XtremIO, SolidFire and a lot of the startups in that flash space have dedupe right away in Version 1.0, because it helps with the flash.
SearchStorage.com: What’s the timetable?
George: I don’t know. With Fluid Cache, we’re heads down right now just getting the 1.0 out, and then we’ll sit down and talk about what’s next.
SearchStorage.com: When is Fluid Cache due?
George: Late Q1, I think, of next year. It might be Q2.
SearchStorage.com: What’s the future direction of NAS and the Dell Fluid File System?
George: What people are expressing a desire for is a big cost-optimized repository where they can just throw everything and not worry about it. [EMC’s] Isilon to me is the only one that has done it well. It’s very simple to manage. It’s the beauty of Isilon. You add nodes, and they just have one big volume and it grows. You put the node in. It grows.
We have a system that works exactly like this in our DX Object Store. It scales out. It’s exactly the model that cloud storage is being built on. Isilon has their OneFS file system, but that doesn’t talk to disk. That talks to -- and they don’t usually really talk about this publicly -- that talks to an object layer. They don’t have a RAID layer. They have a Reed-Solomon erasure-coded object layer, and that’s what has the nodes with disks.
One of the things I think you’ll see from us going forward is where we take our DX Object Store -- which has an infinite number of standard servers with disks in ’em, and you can just keep adding those -- and put the Fluid File System on top of that. That’s the next thing for us.
SearchStorage.com: What’s the scalability limit?
George: There’s no limit. [That’s] one of the reasons we bought Exanet. Exanet as a standalone company ran out of money and had fallen behind on the feature war with NetApp. So, we’ve got some catching up to do with features, but architecturally, it’s really strong.
One of the most important technologies going forward in the unstructured [data] world is erasure coding. [RAID] is using math to solve for conditions of failure. It turns out there are more sophisticated algorithms that can solve for more things than just disk failure. You can draw a pyramid of failures. What if I lost the whole shelf of disks? What if I lost controllers? What if I lost a rack? What if my site burned down? What if there's a citywide disaster, etc.? It turns out that all of these things could be protected mathematically. That field of mathematics is called erasure coding.
Isilon has this. They have Reed-Solomon, an older erasure code, but well respected. And DX Object Store has erasure coding. It's a very intense mathematical algorithm.