WekaIO CEO Liran Zvibel has a two-pronged plan for launching the parallel-file-system startup to success: He intends...
to maximize relationships with large partners and fill a need for storage for artificial intelligence networks.
Zvibel pointed to recent partnerships with Hewlett Packard Enterprise, Mellanox and Amazon Web Services (AWS) as possible launching pads for WekaIO's Matrix parallel file system.
He said he sees Matrix as a good fit for traditional NAS workloads, such as media rendering and life sciences, but he will concentrate on any market that will help the startup grow. That includes AI applications that pose a challenge with large numbers of small files.
WekaIO's scale-out file system can span on-premises and public cloud storage. It pools high-performance flash storage for a hot data tier and offloads colder data to Amazon Simple Storage Service (S3) and OpenStack Swift-compliant object storage that uses hard disk drives.
WekaIO is headquartered in San Jose, Calif., with engineering offices in Tel Aviv, Israel. Zvibel shifted from CTO to CEO late last year. He replaced former CEO Michael Raam, who joined WekaIO at the end of 2015 to get the U.S. office up and running.
"He took the company through our launch," Zvibel said of Raam.
In this Q&A, Zvibel discussed recent WekaIO news, customer trends and his predictions for the future.
What are the most important developments that have happened since WekaIO launched the Matrix file system in July?
Liran Zvibel: The most exciting thing for us as a company is our agreement with HP Enterprise. They will be reselling WekaIO on their platform. We're going to be their high-performance file system. We're working directly with the HPC [high-performance computing] and machine learning group, and we're a very good fit for a lot of their customers. We had been working on it for well over a year.
Another big change is a partnership with Mellanox around InfiniBand. Now, we support either Ethernet or InfiniBand, and we're seeing that a lot of the very interesting use cases for us are actually around InfiniBand for the very high-end life sciences workloads or machine learning.
The big thing we've added on the public cloud is that we're officially on the AWS Marketplace. Now, customers can provision their own WekaIO cluster through their own AWS account, and everything is automatically generated.
Another thing that changed is our snap to object storage functionality. Customers have been asking other storage vendors for it for years. When you have tiering to object storage, you can now take that snapshot and push it to the object storage completely in a way that doesn't require the original cluster. We let you leverage the object storage as a third-party storage solution and enable DR [disaster recovery] if it's available from another data center.
It also enables public cloud use cases. Now, we allow customers to tier to AWS S3 and push a snapshot to S3. The other one is cloud bursting. You can have your on-prem cluster. You do the work. Now, you realize you need a lot more resources. Let's say you take a snapshot. You push it to AWS S3. Now, you can provision your compute and storage clusters on AWS. Our performance linearly scales, so double the instances will get you double the performance and get you the results in half the time. You decide how quickly you want the results.
Do you expect to partner with more cloud providers?
Zvibel: We're currently actively talking with the other cloud providers. Our wish is to have all three, so you could migrate workloads between AWS, [Google Cloud Platform] GCP and [Microsoft] Azure. There are quite a lot of commercial details to close around things. It's not technology only.
How many customers do you have?
Zvibel: We have about 10 customers, and we have probably 40 in some advanced PoC [proof-of-concept] stages.
What sorts of problems are your customers trying to solve?
Zvibel: For the first use case, we are focusing on customers transitioning from CPUs to GPUs [graphics processing units]. People realized about three or four years back that GPUs are a lot more efficient for running deep learning, artificial intelligence networks. Vendors sell GPU-filled servers at about $100,000, and customers want to be able to scale them. So, they spent millions on the compute side. Then, they sit idling around, or they cannot scale further. We show them that we can just fill up their pipes. A lot of these use cases are InfiniBand.
The training [of artificial intelligence networks] today goes over tiny files -- text samples, voice samples, images. The previous generation of parallel file systems could get throughput out of huge files, but they couldn't get throughput out of small files because their metadata wasn't good enough. We show these customers that we can actually get them the throughput that they need.
Another kind of customer that we're looking for are the life sciences customers. The new genomics data sets contain a huge amount of very small files. And the old file systems just cannot handle them. We have solved the current metadata issues. We can read and write to these small files extremely efficiently. So, we are showing these genomic customers that are still mostly using CPUs -- central processing units -- that we let them scale their projects linearly.
Does WekaIO have a true parallel file system with simultaneous, coordinated I/O between the clients and storage?
Zvibel: You could think of it as a double parallel file system. We have a flash-optimized parallel file system for the hot tier. Then, between the flash and the object storage, we're also a parallel file system. You run the large objects in parallel to the object storage. We take the large file. We chop it into small pieces. Each server of the WekaIO cluster handles a different piece of that file, and we'll put it or get it from the object storage concurrently with other nodes.
What are the main distinctions between WekaIO's Matrix file system and Lustre or GPFS, which IBM now calls Spectrum Scale?
Zvibel: We can get very high throughput for tons of small files. The other difference is that we have lifted any metadata restrictions that they had. We can have directories of billions of files, and these directories work as well as directories with a thousand files. If you go to a Lustre, for example, and try putting even a million files in that directory -- and nowadays, people do it -- the metadata operation becomes so slow that it's basically unusable.
What do most of your potential customers use now?
Liran ZvibelCEO, WekaIO
Zvibel: At the standard supercomputing centers, they've compared us to Lustre. We see quite a lot of GPFS. Machine learning has the pure flash play, which is very popular, though it's not parallel, and it's not even scale out. It's just a flash-based NAND. It was able to solve the high throughput for small files up to a point, but it doesn't let you scale. So, in the best case, they can start the project with a flash play, and then they have a problem.
In life sciences, we see a lot of Isilon. If they became I/O-bound, the Isilon [all-flash] Nitro pushes it a bit, but it becomes I/O-bound at some point with the Nitro, as well. Quite a lot of customers still have Panasas running. I don't think they consider them for future purchases, but it's not as if Panasas disappears from the market.
Do you envision going beyond the HPC and life sciences market to the general enterprise?
Zvibel: AI is going to get everywhere. AI is the new big data. Hadoop cases started at the small hyperscalers, then they moved to the enterprise. All enterprises now realize that they have to start solving the same problem that the hyperscalers solved. And, eventually, they will have the same I/O problems. So, this is going to be our easiest way in there.
What are your predictions for this year's biggest storage trends?
Zvibel: This is going to be the year where more and more people are going to get into serious AI projects. The other big thing is that more and more organizations will start leveraging the cloud for its elasticity before they take the full-on move everything there. Totally moving to the cloud is extremely difficult. What we will see people doing is sync with the cloud, moving to the cloud anything that doesn't run all the time -- so, your monthly report or doing DR to the cloud.