Evaluate Weigh the pros and cons of technologies, products and projects you are considering.

Implementing Big, Fast Persistent Memory

Persistent memory software architect Steve Scargall explains how it is possible to bridge the gaps between memory and storage.

Download the presentation: Implementing Big, Fast Persistent Memory

00:04 Speaker 1: So hi everyone. Welcome to this session. My name is Steve Scargall, I am a persistent memory software architect at Intel, and I'd like to take you on a journey of how we got to persistent memory and how we are today closing the memory storage divide. So, to better understand the context of persistent memory computing architecture, we need to take a quick step back in time to the beginning of modern computing architecture as we know it. So, most of this will be familiar with the John von Neumann architecture that was first published by John von Neumann in 1945. His computer architecture design has a CPU consisting of a control unit, an ALU and some registers, and it has some memory to store some non persistent memory data and input output devices attached to buses. So, the Von Neumann architecture is based on the stored program computers concept, where instructions and program data are stored in the same memory unit. And his design is really still with us today, I mean most computers are based off of his architecture.

01:08 S1: So in 1946, Arthur Burks, Herman Goldstine and John von Neumann released a paper entitled Preliminary Discussion of a Logical Design of Electronic Computing Instrument in which they say, ideally what would be desired... Ideally one would desire an infinitely large memory capacity, such that any particular 40-bit number or word would be immediately available and that words would be replaced with new words at about the same rate. It does not seem physically possible to achieve search capacity. We are therefore forced to recognize the possibility of constructing a hierarchy of memories, each of which are greater capacity than the proceeding, but is less quickly available. Obviously back then, you know, he was talking in the order of one to 100 microseconds, but today we mostly have 64-bit computers with latencies of our memory in nanoseconds. But his intent remains true, right and still the same today as it was back then. And that is that our computer systems work most optimally if we can keep all of our data in big and fast memory stored closest to the CPU. And so he says it's not possible to build an infinitely large memory storage pool, we must continue to use his concept of the hierarchy of memory and storage.

02:21 S1: So here's the memory and storage hierarchy pyramid that is described in his paper, and it shows both volatile and non-volatile storage technologies. And I've put on their approximate latency based off of what we have today. Now, this architecture has worked and continues to work because of the 9010 locality rule, which states that 90% of the IO accesses go to about 10% of the data. Now, as a result, we do not need to keep all the data closest to the CPU, we only need to keep the data that we're currently working on or about to use.

02:54 S1: So, we often refer to this data as the hot data or the working set site. Now, as our data sets continue to get larger, we frequently find that our working set size doesn't always fit into memory, and this causes some performance problems. Now, this results in an obvious memory capacity gap and a storage performance gap.

03:56 S1: So, there is a clearly visible line between volatile and non-volatile technologies with an equally visible difference in latency. So, today we have the 128 gig and 256 DDL-4 modules available, but capacity is still limited and for most users, the cost is prohibitively expensive to deploy. So for the decision-makers on the call, there's a difficult trade-off to make here between the cost and performance of the systems that we deploy, whether it be in the cloud or on-premise. Ideally, we want the memory capacity to be much larger without costing a fortune, and we want our storage performance to be much faster with lower latencies. So, the applications aren't stalling while reading and writing data. Now, having larger volatile memory capacities doesn't fully solve the problem as we still need to store data in a non-volatile devices.

04:54 S1: That means constantly moving data between the tiers. So, for situations such as planned maintenance, unplanned outages or application system crash events, we must move all the data from our slow storage back into memory before we can resume operations. To avoid a complete service outage of course, we need to design high availability into our application and environment architectures. However, this adds significant cost and performance complexity into the solution that we'd require additional hardware to solve. But this is precisely where interlocked technology can help. You know, the Intel persistent memory has higher capacity modules, anything up to 512 gigabyte today, and is much lower cost compared with DRAM. A benefit of course is that natively persists data written to any of the devices.

05:44 S1: Persistent memory is installed on the same memory bus as DDR memory, so the CPU has immediate access to it. It's also quite addressable, like DDR, meaning that the CPU needs to read and write the amount of data that an application needs. You know, it doesn't need to do these block IOs where we read an entire 4K block into memory just to modify a byte and then we have to write the entire 4K back again. But to complete the hierarchy our Optane media is also available in NVMe SSDs, and this significantly reduces the latency. Both our reads and writes while continuing to provide a typical block IO solution that we're all familiar with. And performance and latency is significantly improved over the non-technology, thanks to the use of the Optane media, but we still have to deal with the storage stacks. So, this is our drivers in the curdle, we still have the PCI buses, etcetera, to contend with. So, it still has latencies embedded into the stack there, but it's significantly improved over what we have traditional non SSDs. But to truly leverage the benefit of persistent memory, applications need to be modified to intelligent and locate data between the best memory and storage dependent on its access requirements.

07:00 S1: So, we have great traction today with several ISVs having adopted persistent memory and integrated into their software products. Amongst all of the OEMs and ODMs have server products that support Optane persistent memory and Optane SSDs. However, the software journey is just beginning, and it'll take significantly more time for the industry to completely enable persistent memory technology. So maybe that would help. Well, thankfully, we don't have to wait. You know, MemVerge is an industry leader in allowing or modifying applications to tear the data between DRAM persistent memory without needing to make any application code modifications. So, for the first time, memory demanding applications can access terabytes of memory and we can also deliver application consolidation without sacrificing performance, which reduces overall TCO. Now, decision makers are no longer constrained by the DRAM prices and capacities. By combining DRAM and persistent memory on our systems, along with the MemVerge memory engine technology, we can definitely unleash application performance like never before.

08:13 S1: Now, we can't completely remove the non-volatile storage technologies as Von Neumann described but, we're taking significant steps in that direction. And as we increase memory capacities, we reduce the storage requirements. MemVerge CEO, Charles Fan, will talk more about the memory engine technology his company has developed and his vision for the future in a later talk. But for now, I'll just leave you with some persistent memory resources and you're welcome to check out any of these. There are forums and Slack channels, so you can ask any questions relating to persistent memory in general and how you might lean more. So, thank you very much. That concludes the end of my talk and we'll take questions at the end.

Dig Deeper on Flash memory