- Object storage vendor Scality recently raised $22 million in series-C funding, bringing its total investment to $35 million. CEO Jerome Lecat said he plans to use the new money to build out sales and marketing, along with technology enhancements to make the Scality Ring software easier to implement. The funding comes as the object-oriented storage market shows signs of life. Larger organizations are starting to see the technology as a way to tame massive unstructured data stores, while developments such as open source OpenStack Swift and EMC Corp.'s ViPR project bring object storage into the spotlight.
While those developments may create a market for Scality and other smaller object storage vendors, they also make...
for greater competition. SearchStorage spoke to Lecat about how object-oriented storage fits into the overall market, and how Scality intends to grow with that market.
What is the best use case for object-based storage?
Jerome Lecat: In terms of real usages, object storage is deployed for SaaS [Storage-as-a-Service] or cloud storage. It's good for a photo gallery, online storage needs such as backup or active archive, email services, analytics based on Hadoop, and some verticals such as high-performance computing [HPC], health care, data log collection or compliance needs common to many industries. In terms of the technical angle, users consider object storage solutions when they reach the limit of classic SAN and/or NAS [network-attached storage] approaches.
Most object storage vendors are still struggling for customer adoption and acceptance. Is it because object-based storage is a technology that's a bit ahead of its time? Does IT have a compelling reason for the technology change?
Lecat: The main reason for this limited adoption is essentially due to the access methods used to connect and talk with the object storage. Companies need expertise to develop the right application connector based on vendor APIs [application programming interfaces], and such expertise isn't common. Also, users can't modify commercial applications and are stuck with classic data access methods. This is the reason why Scality's Scale Out File System [SOFS], in addition to our HTTP/REST APIs, offers flexible choices to consume the storage for the application user. We'll continue to offer multiple access methods -- NFS now and we'll announce CIFS in the next few weeks.
Is the lack of a standards-based interface an inhibitor to adoption?
Lecat: Clearly, the inhibitor to adoption is the access method. Standards are key at that level. The main standards in object storage today are Amazon S3 as the de facto standard and SNIA's [Storage Networking Industry Association] CDMI [Cloud Data Management Interface] as the industry standard.
How do organizations know if object storage is right for them?
Lecat: Early in the sales cycle, we discuss with the customer what they want to do. We don't try to sell [it] if it's not a fit. We're not good for a 50 TB cloud. We're not good at taking a bunch of multiple applications on a shared NAS system that rely heavily on snapshots -- that's not our core strength. We target customers with a large number of files and volume of data. It doesn't mean we can't start small. We look at the growth rate, one that doubles every year. Object storage isn't for everything [and] people need to understand that.
Does object storage have the potential to replace NAS as the top choice for storing unstructured data?
Lecat: This is a reality today as the biggest NAS systems on the market can't scale beyond 20 PBs. So what can you do after this limitation? Do you multiply clusters? Sixty petabytes means three times 20 PB clusters. This creates complexity and increases costs. What about just the next increment for the additional 1 PB? Do you consider 20 plus 20 plus 20 plus 1, and have a very small fourth cluster that's completely unbalanced with others? For large data environments, clearly object storage is a real alternative to NAS, especially if they're able to provide some file-sharing protocols such as NFS or CIFS.
The idea behind object storage is to avoid any limitation factors. It's a scale-out model in a shared-nothing design.
Some vendors are layering a file system over the object storage. Why do that? Doesn't that limit the scalability that object-oriented storage offers?
Lecat: We see this as well. With file system access, the market and business opportunities are larger and easier to convert. A file system isn't a new animal for the market. But if the file system doesn't leverage the full distributed architecture, it could introduce some bottlenecks and degrade performance and response time. There's a clear convergence of technology in the market with HPC, grid, Web services, and parallel file and object access.
Some of the integrations with Hadoop that take advantage of the parallel, scale-out object store architecture look extremely promising. Do you agree?
Lecat: Hadoop has several good properties. It immediately means you don't need a compute cluster plus a storage cluster, thus doubling your hardware and your cost. Scality belongs to the category where you can run jobs on the same storage cluster as our software, the Ring. Our software is deployed on classic x86 servers. So when you elect a Hadoop job, the idea is to run it on the node that stores the data this job needs to process. The Hadoop Distributed File System has some limitations, and some market offerings choose to replace this layer with their own module. This is the case for Scality, as we leverage Scality SOFS glued with CDMI to offer computing in place without the need to extract, transfer and load data from any sources, and especially a separate cluster. This is a convergent computing play.
Object storage can use replication or erasure coding to protect data. When is replication better, and when is erasure coding preferred?
Lecat: Both data protection techniques are good, providing high durability levels. Replication offers a simple but efficient method. There's no data transformation and access is direct. Data is just copied in different locations within the cluster and recovery is very fast thanks to a very efficient parallel regeneration processes. Replication is good when you have small objects. When objects get bigger, replication is no longer the right approach, spending time to re-copy data. Also, at a large scale, the hardware redundancy could be prohibitive and limits the adoption due to the global solution cost. Imagine you consider a cluster of 10 PBs of usable storage and you need 20 PBs of additional storage if you use three-way replication. So this is more of a financial issue than a technical one.
The second approach is technique based on erasure coding where you don't multiply data instances, but you generate parities, checksums or equations that integrate the protection schema. So the simple rule of thumb based on experience and data protection methods is that erasure code techniques are recommended for large-scale environments or large data sets, and you should pick replication for small data sizes and small clusters.