Hedvig Inc. CEO and founder Avinash Lakshman envisions a world where self-service, virtualized storage runs in...
clusters on commodity hardware across private and public clouds, such as Amazon, Google and Microsoft Azure.
His startup, based in Santa Clara, Calif., will release the second version of the Hedvig Distributed Storage Platform this month. Hedvig is also promoting a Universal Data Plane to provide a programmable data management layer to enable any application workload to store and protect data across any cloud or tier.
Lakshman spoke with SearchCloudStorage about software-defined versus traditional storage, the ways public cloud providers paved the way, the best use cases for Hedvig's approach and the difficulty in articulating his company's ultimate end goal.
What is unique about your Universal Data Plane?
Lakshman: A lot of it has to do with my background. My background is from distributed systems. I spent 2004 to 2007, three years, at Amazon, where I was one of the co-inventors of Amazon Dynamo. Amazon Dynamo is the genesis for the entire NoSQL movement. After that, I moved to Facebook in early 2007. I was a very early engineer there, and Apache Cassandra was my brainchild. It's one of the most popular NoSQL systems out there.
The common trend between both these systems [is] that they are implicitly multisite in nature, meaning you could deploy them as clusters that span multiple sites. What we have done with Hedvig is push that analogy even further. We've blurred the lines between what a customer-run data center site is and what the public cloud is right now. You could have clusters span any site and provide that capability natively. You could run our systems across multiple cloud vendors -- perhaps Google, AWS [Amazon Web Services], Azure -- and keep data replicated across all of them seamlessly.
In what ways is your model more flexible than traditional storage?
Lakshman: We are in the day and age of cloud. Cloud means a lot of things to a lot of people. To me, it's virtualization of some form and self-service. It's the self-service aspects that are completely absent in traditional systems. The ability to program against them, the ability to have them expose REST-based APIs that make you provision storage assets from perhaps your iPhone -- those are things that are completely absent in [traditional storage], and that is very essential for one to stay relevant in today's world. That's what we bring to the table.
Will the replication-based, software-defined storage model actually be less expensive for enterprises?
Lakshman: The whole world today is moving away from custom hardware solutions. Commodity hardware is the name of the game. All these systems are designed to run on off-the-shelf commodity hardware; no custom ASICs [application-specific integrated circuits], nothing. Scale-up systems are practically dead. Everyone's moving to a scale-out-like architecture. The ROI is pretty simple, because hardware costs are going nowhere but down.
People like Amazon, Google and all these large internet-scale companies are obviously going that route. It's forced the enterprise to take a look at them and ask the question, 'If they can do a lot more with a lot less, why can't we?' That's why this is becoming very prevalent in the enterprise today.
Does the model work for an organization that won't reach the economy of scale of Amazon, Google or Microsoft?
Lakshman: I think the economic model will work for everybody. But will the economic model work for us? That's what we need to be more cognizant of. What I mean when I say that is: We want to go into environments where there is data growth, where we can grow with the company. We wouldn't want to target perhaps small businesses, where we go in and they know that they have a fixed amount of data, and it's never going to grow over the next five or 10 years. That may not be an economically feasible model for me, but it still would be economically very feasible for them.
Is there a data threshold at which your platform becomes economically feasible for you and for the customer?
Lakshman: We are targeting medium to large enterprises at this point. We don't really care how much you have today, but we want to see that there's a potential for huge growth over time. Let's say you are a customer who wants to start with 100 TB, but there is a potential for that 100 TB to grow to multiple petabytes over the next two, three, four years. That works beautifully for us. But if you're a customer who has 100 TB and you're going to remain there for the next five years, we would still not say no to the business, but that's not our ideal customer.
Lakshman: We're going to charge them on capacity, be it perpetual or subscription, and that price never changes. So, [Hedvig and the customer] want to get more and more and more out of the system over time. With most of the features that we have, the real benefits come as and when there is data explosion, and new customers may not see the benefits of our systems if their data is stagnant and not growing.
Can you really take storage software and run it on any hardware, or is some tuning required?
Lakshman: There is this perception that everybody believes now that software is king. They think that software can run on any kind of hardware, which is not necessarily true. You don't want to rely on specific hardware characteristics. For example, you don't want your software to be dependent on, say, compression in an ASIC or dedupe in an ASIC. But you do need the hardware to be potent enough to drive and deliver the promise of your software. We recommend SKUs that people use. The good thing about off-the-shelf hardware is you have the same SKU across every server vendor.
Hedvig supports inline deduplication and compression. How do you deal with the performance hit?
Lakshman: Architecturally, we are very different from traditional systems, and that's what allows us to do this inline. We have designed our systems to be log-structured, and that is very flash-friendly by default. That also helps in a disk-based workload. The way these systems are designed is they completely remove random writes out of the equation.
Clouds such as Amazon, Google and Microsoft use erasure coding. Is Hedvig's model based strictly on replication?
Lakshman: Erasure coding is in the roadmap. I think it's about Q2 next year.
What do you see as the pros and cons of erasure coding versus replication?
Lakshman: Erasure coding is good for data sets where you just store the data, but hardly read them, so more like long-term archival kind of situations because reads can be extremely slow. That's one of the reasons why in real-time systems, erasure coding may not be a good fit.
Does Hedvig focus mainly on primary storage?
Lakshman: We are deploying in primary, and we also have deployments where we end up being a backup target. Basically, we are a replacement for Data Domain. A lot of people have been looking at us that way, too.
What was your early vision of what Hedvig software would be good for, and how did that change?
Lakshman: For me, from Day 1, there was never a doubt in my mind that most of the workloads can be supported on a system like this. And it has only been vetted out over time. For example, we made this announcement that BNP Paribas is our flagship customer. The first thing they did was put VM-like [virtual machine-like] workloads on us. We went through that without any issue. Then, we started looking at putting database workloads. People were skeptical. But we ran it. Now, that's something that we don't even worry about. Then, they started running big-data-like workloads. Initially, there was a mental block in terms of whether we'd be successful or not because it's shared storage. We were able to consume that, too. We had never done backup before. But we had this nice opportunity to get into the backup space. We proved ourselves over there. So, we have been knocking them off one after the other.
The end goal is not very easy to state because it also completely depends on the ecosystem around the compute and how that is going to evolve.
What are you thinking of when you say compute?
Lakshman: Containers are really gaining a lot of ground now. If containers really take off, I don't know how we would react to it. What they come out with, we do not know. We cannot predict. But we've still got to be relevant no matter who it is, because we shouldn't really hedge our bets on any one thing. It should be able to be relevant no matter what happens in the industry. The industry is in a flux right now with hypervisor-based becoming more kind of old-school. There's a big push for containerization.
Integration approaches for hybrid cloud storage
Guide to software-defined storage
Does the software-defined storage market need to be reimagined?