The performance of object-oriented storage has improved to the point where it can be considered an option for primary data as well as its main original use case of archival data.
Marc Staimer, president and founder of Beaverton, Ore.-based Dragon Slayer Consulting, said that although object storage software introduces latency, and the metadata associated with it adds extra time for reads and writes, vendors have worked hard to improve performance.
In this podcast interview with TechTarget's Carol Sliwa, Staimer explains what manufacturers have done to speed object-oriented storage, what types of data are suitable for object storage and what users can do to tune performance for their applications.
How would you characterize the performance of object storage systems today in comparison to three to five years ago?
Marc Staimer: It's much improved because the underlying technology is much improved, but so are the algorithms. The thing to understand about object storage as you talk about performance is object storage is a software layer that runs on top of block storage. Think of it like file storage that is a software layer that runs on top of block storage. So, you're interfacing with the software. Like all software, the amount of software affects the performance ultimately because it introduces latency. And because object storage adds metadata to everything you write, it takes more time to write and more time to read. And if you do things like erasure codes, then again, it adds more time. But most of the object storage providers have been working hard at making that a faster process.
What have the manufacturers done to improve the performance of object-oriented storage?
Staimer: They're taking advantage of some of the technologies that have come out. I mean, most storage today is x86-based, and object storage is definitively tied to x86 type of architectures. And, of course, Intel has made the processors faster -- more cores, more memory capability and more ability to address more memory. So, what you end up with is a better ecosystem for object storage in general as the x86 platform continues to evolve. In addition, they've been able to take advantage of flash storage underneath because it's a server-based storage architecture, because the object storage runs in the server plus embedded storage. Using flash as caching, using flash as tiering, the object storage vendors have been able to really take advantage of that infrastructure to provide much, much faster storage. And [faster] to the point where there are storage systems out there today that are block-based and underneath use object to take advantage of some of object's unlimited scalability, distribution and ability to scale on demand without any changes or any disruptions. So, that functional ecosystem underneath has really helped improve performance.
Has it helped to improve performance enough to the point where object storage becomes suitable for primary data?
Staimer: Yes, it can because candidly the storage system for most applications will be indistinguishable from file or block storage systems that they might use otherwise. The underlying ecosystem -- hardware, media, processing, memory -- have all improved to the point where it's going to be very difficult to tell the difference. The biggest difference still comes down to latency, but the latency differences have shrunk significantly.
What types of primary data are we talking about with object storage? Databases? Or other types of data that the user would access on a frequent basis?
Staimer: It depends. There's OLTP -- online transaction processing -- or online analytical processing, or OLAP. But, generally speaking, when you talk about that kind of performance, transaction processing, you're probably not going to use object storage. You want the lowest latency possible because it's an IOPS issue. How many I/Os per second can you get to speed up your database? That's a different animal. That's where latency will rear its head.
But if you're talking about databases like email, if you're talking about databases such as casual databases or ones that might be used in day-to-day but not necessarily transaction processing, yeah, it'll be fine. The issue then becomes the interface in front of the object storage. Most applications aren't going to use a REST interface, which is the standard interface on most object storage. But, today, you're going to find that a lot of them have a block interface, a file interface or they'll have a gateway that will convert to it. So, it's indistinguishable, as I said, for the vast majority of applications.
Does flash used with object storage open the door to the possibility of using it for transaction-based data as well?
Staimer: It's possible. It depends on the performance that you require. I would always recommend testing first before you make the leap. Make sure it's going to work for you.
Speaking of software, are there significant differences between the performance of one vendor's object storage and another vendor's object storage?
Staimer: There is. It comes down to how well they've architected their software, how well they can take advantage of the underlying ecosystem. Some are going to be significantly faster than others. I'll give you an example. Let's say you're using erasure coding, which today is kind of table stakes in object storage. If you're going to use erasure coding for every size object that you get, you're going to get terrible performance in small objects and great performance in large objects. Not a great idea.
So, by having the ability to use, let's say, multi-copy mirroring, where you're creating two copies for every original copy you have or three copies total for small objects, then you're not going to have the latency overhead of erasure coding. At some point when that data gets old, you may want to convert it to erasure-coded data, but while it's young and highly active and being accessed a lot, and it's small objects to begin with, you may elect not to do that.
On the other hand, you may be using a database. So, if your software allows you to make that decision that for this data for this period of time, as long as it's 30 days or younger, it's not going to be erasure coded, but as it gets older, it will, then you can have better performance than somebody who doesn't do that.
Is there anything a potential end user can do to improve the performance of object storage?
Staimer: The first thing is always work with the vendor who's supplying it, unless you're working with open source and there isn't a vendor supplying it. But, generally speaking, work with the vendor. They know all the tricks of the trade. They know how to make it the fastest possible for different workloads. The things you can do are add flash [solid-state drives] SSDs or add flash PCI Express or even flash in-memory DIMMs can really speed it up. In some cases, you can move the software completely in-memory, so that will speed it up. In other cases, it depends on the interface. Are you connecting over 1 Gb or 10 Gb? It's a complete infrastructure look. How you are setting this up will affect your performance. Are you using converged Ethernet, or are you using standard Ethernet? What's the other traffic? Did you set up a separate Ethernet infrastructure for it, or is this running co-tenant with everything else? So, there are a lot of different ways to tune this. Most object storage systems have the ability to be tuned, to be tweaked for specific workloads. So, again, you want to work closely with the vendor to make sure it's optimized for your workloads.
A lot of end users are still trying to wrap their arms around what they're going to use object storage for. From a performance standpoint, what kind of advice or recommendations would you offer them as they contemplate the use of object storage?
Staimer: Object storage has been gaining more and more acceptance for a variety of reasons. But one of the key reasons is data durability, data resilience. If you're using object storage with erasure coding or geolocation erasure coding -- in other words, multiple sites, multiple geographic locations -- any one site can access the data. Instead of having, let's say, three sites with three copies of the data total, you may have three sites with two-thirds more data. But, from any one site, because you can access all the data without going to another site, that's a whole lot less overhead. Instead of 300% storage for 100% of the data, you would have 166% storage for three viable copies of the data. It would seem that way to you. So, it's more cost-effective. It's more resilient. It will protect you against more failures.
There's a lot of use for object storage for what I call passive data -- data that's not accessed very frequently, data that's going to be around for a long time but not necessarily actively used. That's a huge amount of data. In fact, more than 90% of most people's data is passive. All you have to do is look at the data on your laptop and you'll see that.
So, if that's going into object storage that makes a lot of sense. It'll save you money. It'll be more resilient because it'll protect you against more failures than two drives, even more than a whole location failure. You don't have to do [disaster recovery] DR if it's spread out over multiple locations. There's a lot of positives and a lot of places to use it. You can even use it on active data if it's fast enough for you.
This was first published in April 2014