Guest Post

Storage for AI in 2025 and How We Got There

AI applications will become more prevalent, and storage systems will have to be designed specially to meet their needs. Discover what storage for AI will look like in 2025.

00:02 J Metz: Hi there, and welcome to the 2020 Flash Memory Summit. Today, we have a very special panel who is going to talk about the future of storage for artificial intelligence in 2025. And we're going to have a way of looking back from 2025 into how we're going to get there.

My name is J Metz. I am the Chair of the Board of Directors for SNIA, the Storage Networking Industry Association. But I am only your master of ceremonies today, I am only your guest host. The real stars of the show today are our panelists, who happen to be Gary Grider from Los Alamos National Laboratory; Scott Sinclair, who is from ESG; and Dave Eggleston, who is from Intuitive Cognition Consulting. My apologies for stuttering there, Dave. And then last, but certainly not least, we've got Dejan Kocic, who is from NetApp. Let me go ahead and bring us up to speed as to who they are in greater detail, because I know you're all itching to get to the actual good stuff.

01:05 JM: Having said all that, let me start off with Gary. Gary is the division leader of the HPC Division of the Los Alamos National Laboratory, and he operates one of the largest supercomputing centers in the world and is one of the foremost experts on supercomputing and HPC in general. He's responsible for all of the high-performance computing research and technologies, the development and the deployment at LANL, and has a slew of enviable patents that have both been granted and are in pending. It makes me patent-envious right now.

01:41 JM: After we talk with Gary, we're going to be doing a little bit of an adventure into the analysis with Scott Sinclair, who is going to talk about some of the AI ecosystem. And he's going to be discussing some of the things where he's had the experience in technology analysis and the research and the validation, and has quite a long history in the IT industry in both enterprise storage as well as infrastructure. So, he is perfect for this particular part of the conversation. No pressure there.

02:15 JM: And third, we've got Dave Eggleston. And if I'm going really fast, it's because I know we don't have a lot of time, and you really want to get to these guys and not me. Dave is the principal and owner of the Intuitive Cognition Consulting. See, I can say it correctly. And he advises companies on the memory and storage strategies both now and in the future, and has got a very long and distinguished career, although we're probably all showing our age at this point in time with the number of years we have in experience. And he's got a lot of experience with small startups to very large companies that you've all heard of and have probably on your desk right now.

02:54 JM: And then, again, last but not least, we've got Dejan Kocic. He is a senior solutions engineer for NetApp, and he also has a great deal of experience in the high-performing computing space. And, obviously, since we're talking about AI, he's going to be talking a little bit more about the storage from neural networks and the deep learning, and what kind of solutions are being used now, and what's going to be available in the next five years.

03:19 JM: Hopefully, that was not too much of a blitzkrieg through everybody's biography, but I am going to shut up now and pass it over to Gary, who's going to talk about how to deal with the flexibility in a storage system. Gary, I'm going to shut up and allow you to take over.

03:36 Gary Grider: Thank you very much. Mostly, I'm going to talk today about how HPC centers typically weren't born AI centers, they were typically born for other applications, usually simulation or something of that ilk. And AI has to be accommodated as a first-class citizen at these sites now, and so how do we move from where we're at today towards storage systems that can handle both or multiple workloads where AI is one of the main ones? Next slide, please.

04:12 GG: We looked at some of the AI that was being contemplated at the lab and compared it to simulation, graph analytics and other science work that's going on at the lab on large computers, and in a number of ways thinking about it in a key value paradigm just as a way to look at it, and you could see that work we did in this table tries to show the difference between those workloads to the storage system, if you were thinking about it in a key value sense. And you can see that some of them are larger than others, some of them have more reads, some of them have more writes, some of them have more IOPS, some of them more bandwidth and so forth. And some of them have different sizes of keys and the like. And so, this was a useful way for us to think about it. Next slide.

05:06 GG: I mapped those onto this little IOPS throughput chart, which was a simple way to think about it, and then try to throw each one of those categories of workload onto this graph analytics, of course, this large numbers of IOPS, pointer chasing, and the like. And simulation, of course, requires just massive bandwidths, terabytes per second or more if it can get it. And modest IOPS today, at least, and then, of course, AI models has plenty of bandwidth and plenty of IOPS as well, and so forth. Anyway, this was a useful way for us to think about it, and so I put it onto this chart. And then I started to think about, "OK, in the red, what are emerging technologies that I would use to solve these problems?"

05:53 GG: And you can see that, with graph analytics, it looks like NVMe KVS might be a good answer to that. Mostly it's point queries and pointer chasing, so you might want to have a pointer chasing acceleration in the smarts, into the KVS, but it's highly threaded single puts, and gets, and the like, where AI models is sort of a mix. It's a mix of . . . It needs lots of bandwidth, so you may need to offload and pipeline erasure and compression, and the like, to make up for the fact that storage servers don't have much bandwidth. And, so, you may need to enable direct-to-smart devices for bandwidth purposes. But at the same time, you also may need some sort of key value capabilities because of the IOPS that it needs and the range queries and the range generators on read. Where simulation science is very multidimensional, and so it needs multidimensional indexes and the like, which is in the NVMe KVS area. At any rate, that's how we thought about it. And then, of course, there are common needs, easy zoning for carving up space on-the-fly, giving access to user space, not necessarily just kernel space, so that we don't throw away all the IOPS and bandwidth going through kernels, and a good security model for such a thing. Next, slide please.

07:22 GG: The idea, the optimum for us, would be for a job scheduler or orchestrator or the like would get a job, it would request of all these tens of thousands of NVMe devices or zones to configure themselves as KVSs or as smart offload engines for bandwidth or the like, where the job scheduler gives somebody a key. Key is distributed to all the people that are going to use it, the application opens it up and uses it, and the like. And then job exits, and the data is stuffed away at a lower tier.

Anyway, that's how we see it and how we would like to think about it. Next slide. Anyway, there's also a consortium that we have that is thinking about these things, and if you're interested in joining, you can look it up on the web. It's [email protected] So, take a look. Thank you.

08:26 JM: Great. Well, that was extremely efficient. We're going to come back to some of these things. I wrote down some notes about questions. But before we get to that, we're going to move on to Scott and allow Scott to fill out some of the blanks. Scott?

08:43 Scott Sinclair: Hey, thanks, J. I think Gary gave us all a real deep look at AI and what he's doing. From my perspective, as an industry analyst, we do a tremendous amount of research into what enterprises are doing, so I'm going to move you way up into the sky and give you that 50,000-foot view.

When we think about current adoption in the state of AI-based workloads within enterprise environments, one of the big things is this is a massive growth area. Most of you, if you're watching this, you're probably . . . You have a high interest from your business in this space, in our research. For example, a research we did earlier this year, 64%, nearly two-thirds of organizations that had AI were saying, "Look, we're increasing our investment this year." And candidly, that was before COVID-19. With COVID-19, we are seeing, and likely you are, a huge acceleration in pretty much any sort of digital initiative, AI being one of those. So, we're seeing it across lots of different industries, and I don't think any industry is immune.

09:51 SS: When you think about common challenges, one of the common narratives that I hear within AI, especially among enterprises, is the CTO, CEO, someone in the C-suite, says, "I read about AI. We need AI. How do we do AI? I got it, let's hire a data scientist. Great. We hired a data scientist. We need AI, what do we do? Well, we need a training environment. Let's go look at HPC. Let's go invest in that."

That's all well and good, and they're all valuable, data scientists are insanely valuable, as well as the HPC-based training environment, but often the missteps that organizations find is around personnel and around infrastructure. Around personnel, for example, 63% of organizations that are doing AI say that they're being asked to do activities along the data pipeline that fall well outside their areas of expertise. Typically, these are data scientists that are finding themselves focused more and more on things like data prep, data integration and having that consume their life.

11:00 SS: There's a joke that I continuously hear, and many of you may actually be familiar with, where it says, "Ninety percent of a data scientist's job is two activities: cleaning the data and complaining about cleaning the data." Essentially . . . And then that leads me . . . That joke actually sets up the next big challenge, which is infrastructure is . . . Gary's slides really gave a ton of great insight on why latency and throughput are essential for AI-based workloads, but also . . . But in focusing on that and delivering that, it's easy to forget how essential performance is across the entire data pipeline. One of the big things that we typically see is an organization to struggle with AI, is a over . . . It's hard to say over-focus, because it's so important, but they focus on the training and ignore the rest of the data pipeline and how essential it is to have massive capacity and high performance and scale within all elements of that pipeline. That's one of the big challenges, really the second one in addition to the personnel challenge.

12:14 SS: I think a third challenge that I just would hit at is what I call the massive success problem. And this is typically a great problem to have, but it's a problem many organizations run into. It's a fact that, in our conversations, in our research around AI, organizations are often very successful with these use cases and find direct ties between business success and their investment in AI. That's good. If you're listening to this, and you're interested in it, that is the good news. The "bad news," or the challenge that arises from the massive success problem, is you may design your infrastructure and your workload around one or two specific use cases not realizing that six months down the road, 12 months down the road, you're going to have a myriad of new use cases that you did not think of, that your business is going to want you to explore, and you may have gone down a road with your infrastructure that may not be conducive to, "How do you move data across the data pipeline?" Now, it's being generated in different locations. Now, it wants to be analyzed in different locations.

13:28 SS: So, as you think about architecting infrastructure for AI-based workloads, think not just about training, try to think across the entire data pipeline and also try to give yourself as much capability in terms of performance and scale, but also as much flexibility in terms of deployment and where you can invest your money and where data can be created as you possibly can. If you do that, I think you put yourself in a good, really in a good state to solve some of these common challenges.

14:04 SS: And I think that hits on the last, is what are the state of organizations in where they are today, where they're trying, where they might struggle. I hit on some of the struggles, but I think one of the big things that we see is how early this industry is in terms of enterprise-level adoption. You're going to hear from a lot of speakers, I assume, on this panel, of great insights in terms of technology and how that translated into AI-based success. When we do macro studies across what's across the . . . across enterprises, it's easy to say there isn't just like a silver bullet yet, because what we find is a myriad of different implementations and a myriad of different problems. When we ask organizations... For example, when we ask decision makers and leaders to say, "OK, what is the weak link in your environment?" And we say, "Here's a dozen different technologies, and you get to choose two." Everybody chooses two, and it's spread across the entire gamut of technology options.

15:13 SS: My takeaway from that is, right now we're in a state where . . . Depending on who you are and depending what you're trying, chances are you have weak links, and that really tends to . . . We're still, as an industry, developing what is the magic bullet for these environments and really trying to propagate best practices, and I think that's a lot of what this conversation is about. And I'll end it there, and we can move on to the other speakers.

15:40 JM: Excellent, thank you. I'm really happy that you covered a couple of the things that the data scientists were having issues with because it's something that has been on the back of my mind for a long time. I really hope we have enough time to get to some of that in the future. Now, however, we're going to go on to Dave Eggleston. And, Dave, you're going to talk a bit about some of the GPU, DPU conversations, are you not?

16:05 Dave Eggleston: Yeah. And thank you very much, J. Thanks for allowing me to be part of this panel. And, Scott, thanks for saying that "We're in the early stages," which we really are. Now, one of the key things that I want to focus on is we have this beast, and I like to call it "the beast," or where that works being done on analyzing data, and we're going to feed that beast with data, getting a lot of data in, getting data out. And that doesn't just occur at the CPU anymore. We've got multiple processing units. We've got CPUs, GPUs … and GPUs are really beasts taking a lot of data, this new thing called a DPU, which I'm going to talk about a little bit, and then also FPGAs, as well as dedicated ASIC. That's five different types of compute that need data fed to them. That's a different environment that we've had before. One of the good things that we've got is we do have a common fabric that could tie all that together, and that being NVMe-oF, and there will be other speakers in this track that will talk about that and their approaches to it.

17:11 DE: One of the key things here has been the CPUs acting as a traffic cop. Let's think about first just feeding that GPU. And if we're getting all the data through the CPU, then it's really the software stack, and the CPU is getting in the way, and that can be the file system, that could be the block drivers or I/O drivers that can be a bit of a problem. So, one of the near-term things we can do is in software, and what do we do to bypass that traffic cop? There's a couple of different solutions here, one of them being with GPUDirect Storage, which rests on RDMA and bypassing that CPU being that traffic cop. We see examples of that, and I go into it in my talk in the B-9 session in some detail and some of the results . . . And Nvidia just rolled out the GPUDirect Storage, but basically you get about a 3x or 4x improvement in throughput to your I/O subsystem by bypassing the CPU. That gives us some idea of what a GPU can . . . and how it relates to storage, in a software way, we can do it.

18:20 DE: Now, one of the newest things, or at least a new term, is DPU, which is data processing unit, and that's an idea which is a little different. That's taking hardware and an intelligence and pushing it out towards a network card, so kind of expanding on the idea of a SmartNIC. And we see Nvidia talking about that, we see a new company, Fungible, talking about that. Also, keynote here at FMS was Marvell also talking about it, so a very hot area in SmartNICs and then also adding this capability. What does that mean? The DPU can manage a lot of different storage, and that can manage it both on prem, off prem. There's different ways that we can divide up that data. But the DPU can manage a lot of SSDs, and it has hardware accelerators right there, so it's pushing that intelligence, those hardware accelerators, to do a lot of different tasks, to do erasure coding, to do compression, decompression, et cetera. Again, having it at hardware right there.

19:20 DE: And then a final approach, which I think is farther out in the future, and the software approach being near-term, deep use being kind of mid-term, farther out in the future is, what if we avoid I/O altogether? I listed a couple of different companies that are doing something. In my talk, I go into a company called Groq, which is actually doing a very large inferencing chip where they just put a ton of SRAM, 220 megabytes of SRAM, huge chip. But that feeds the MMUs right there to do the fast operation, so that's a way to feed that beast.

19:55 DE: Another way to do it is the way that . . . I'll talk about Penguin, HPC specialist. They've taken Intel's Optane persistent memory, put it in their system to replace SSDs and they did it . . . An example, with the Facebook deep-learning recommendation engine, that was a 10x speed-up in inferencing. And then a key thing is they use a company called MemVerge because using Optane persistent memory can be hard, but using MemVerge, they could virtualize that interface out to that memory. And they were able to get that 10x speed-up without rewriting their app. I think that's going to be a main thing in implementing these new memory technologies.

20:36 DE: And then, finally, Intel even talked about it in their keynote yesterday here at FMS. They have a new concept of when you have small and misaligned blocks of I/O, what do you do with that small and misaligned I/O, and doesn't meet the block boundary? And, so, what they do is they came up with a distributed asynchronous object store engine, and that'll send the big blocks to SSDs, just like you normally would, but then it would send the small and misaligned blocks to the Optane persistent memory DIMMs. That's something new because those are not block aligned, so those can handle those small amounts of data. And when they did that, they left immediately to the HPC IO-500, top of that list for IO-500, in their performance. That's a really new concept of breaking up the I/O that we need to do.

21:33 DE: And, so, I think this memory approach is going to take quite a bit of time before we get there, but these are some of the different ways, a software approach near-term, RDMA, GPUDirect Storage, DPU midterm, pushing that intelligent out to the network card, and then long term, "Hey, let's just not do I/O at all, let's pull it right into the memory as much as we can." Back to you, J.

21:57 JM: All I can say is that I wish I had another hour-and-a-half to explore some of these concepts. There's so much going on. And a lot of the stuff we actually predicted and foresaw in Flash Memory Summits of days past is now coming to pass. The computational element of storage, avoiding the I/O altogether, this was stuff we talked about five years ago, and here we are predicting about what's going on in the future. Speaking of the future, Dejan is up to talk about data movement and processing. Dejan, please, take it away.

22:33 Dejan Kocic: Hey, J, hey, everyone. Thank you for having me. We have heard many valuable and important points so far. When it comes to AI, one thing is really important for AI, obviously, and that is data. And AI requires a lot of data, so much data that, as we have heard so far, our existing infrastructure has a hard time coping with it in terms of ingest, in terms of storage processing, and later when it comes to inference real-time throughput and latency. These are some really big challenges for AI environment, especially when those AI environments start to grow. Another thing that is really important to recognize that many of the processes that are used for AI, and they have a manual, for example, data image labeling or data labeling, now they are becoming automated, of course, using AI.

23:36 DK: And another really important thing what I think is going to start happening more and more is data reduction. Data reduction in the sense that there's a lot of data used for AI, especially when it comes to training, that is duplicate, redundant, or in some way or other not necessary. For example, we have some information about data gathered by the survey car. These are the cars, for example, that are going on the streets and they're collecting information about roads, traffic signs, intersections and different things that are needed to create maps, and also create self-driving capabilities. What happens is that one of those survey cars usually collect about 2 petabytes of data per car, per year and that translates to over a billion images per year. But there is only about 3 million of useful images that are actually needed because there's a lot of data that is being duplicated in a sense that stop sign looks the same everywhere in the U.S., and probably around the world, that, for example, traffic lights look pretty much the same.

24:54 DK: And there's so much overlap, that data, if it doesn't have to be processed or, in other words, if the source can be removed, that helps to alleviate some of these problems in terms of data storage, getting data to the core site for processing, and it just takes a lot less resources. And this is one thing that computational storage can actually play a big role in, and I would think that computational storage is going to grow more and more in the future. And then that also just helps the overall system and efficiency of the machine learning and AI flow, so that less data is needed to complete the entire flow of data, for example, from edge to core to cloud, or whatever combination of those is used. That is one thing that I consider that is going to be important in the future.

25:58 DK: And then another one is the data interconnect. Most of the data transfer today is done in some form of PCIe extension, so whether we are using RDMA-type protocols with NVMe over Fabrics or others, but PCIe in the terms of architecture and the way it operates kind of reaching its limits. And then we have some other solutions in the market . . . None yet on the market, but coming to the market and standards being solidified, such as Gen-Z, CCIX, OpenCAPI and others. So, I would think that these new data transfer protocols and data interconnects would play more significant roles in the future simply because they will help to reduce the latency to a large extent, or to some extent, and also to allow much higher throughput of the data from one point to another. That's all for me.

27:21 SS: J, we need you to come off mute there, sir.

27:25 JM: Thank you. Color me embarrassed, oops. You probably just saw my mouth going like this. [chuckle] You can hear me now though OK, right?

27:35 SS: Yes.

27:36 JM: All right. I took off the video so that we could see our bright, shining, smiling faces, but not my audio apparently. I have a couple of questions. I took some notes that I jotted down, I'd like to try to get to them. But before we do, there is a question that came through for you, Scott, and I'd like to address the question from the audience first.

The question is . . . Paul Sherman writes about the lack of being generic, the massive success problem, and wants to know, "Abstract or common design . . . " Can you tell I'm reading this now? "Abstract or common design has a long, slow development time. How might you suggest balancing specific concrete time-to-market forces versus a more slow start yet exponential growth of innovation?" It sounds like a doctoral dissertation question.

28:25 SS: It does, it does. The way I read this is, how do I balance taking the time to architect the right answer versus doing it in a timeframe that my boss wants me to accomplish it in? That's how I oversimplify it, but I could be wrong. And candidly, it's a challenging problem. I said this earlier. It's early, there's no silver bullets. I think both Dejan and Dave had some interesting insights into various technologies that are emerging. I think one of the . . . To help address this problem, I think one of the things that I see with organizations that tend to be more successful is a balance of, in terms of projects, try to define a bite-size-level project or try to make it manageable in terms of your goals and what you're trying to achieve. But when it comes to investments, try to err on the side of as much scalability and flexibility as you can possibly have. So, scalability in terms of capacity sizes and in terms of performance.

29:35 SS: There are lots of different technologies out there, some are more software-based that give you more flexibility, some are more hardware-constrained in terms of . . . And a lot of this is more focused on storage-centric because, in these environments, storage tends to be one of the big problem areas in terms of scale. And one of the areas I've . . . For example, one of the big things is, if you think about the various stages of the data pipeline, things like where the data is collected, where it's prepped, where you select it for training, where you do your model development, is keep in mind that, as you start leveraging AI for more use cases, all those activities may change locations. And if it does, what does that look like? And certain places, maybe it's at the edge where before you were doing data collection, maybe you want to start doing some very tiny model development out there for something else that needs very short term.

30:34 SS: These are things to just comprehend as you think about your infrastructure selection. But again, it's very early, and it's hard to have a silver bullet, because a) the technology is early, but also every enterprise is different in terms of what they do, and the data that they're collecting, and what their overall environment looks like. So, it's tough to have one answer for everyone.

30:58 JM: Well, I think that dovetails back to what you were saying earlier, in that the expectations of data science were wildly inaccurate. I honestly think that people genuinely expected that they could hire somebody who call them a data scientist and expect them to understand how to clean data and the necessity of it. Any grad student will tell you how much fun cleaning data is. But it needs to get done, and there's no button that you can push because you don't know what the data is supposed to look like until you actually go through it. We got a mismatch of our expectations as to how long that takes to do it and where you're going to get some ROI from the results of it.

31:40 JM: And that actually dovetails into one of my key questions, which is for everybody, but I want to start with you, Gary. In your presentation, you were talking about the importance of key-value stores and the performance throughput and bandwidth elements that go along with it. One of the problems with key-value stores is that it has not, in the past, been able to jive with infrastructure POSIX API-based enterprise workloads, so it's been always separated out from the way that it could fit into the enterprise. So, the question really becomes "Is there a necessity or a need for a separate Venn diagram between the HPC world and the enterprise world?" Because as we start to collide in that fashion, where we've got some solutions that work really well for HPC, some not so much for enterprise, is there an overlap? Do they need to be separated? Do they need to be identified properly? How do you see that working out?

32:42 GG: That's a good question. We're pushing on key-value store primarily because our data sets are huge, petabytes typically. We can generate those ad nauseam for one every several minutes, if we wanted to. In moving that kind of data around, even with terabyte-per-second networks and $100 million networks and the like, it still takes a long time. And if you can avoid moving the data, moving the process to the data, as you talked about, you win. And, many times in our world, and I think in many worlds, subsetting to get a smaller answer back is far better. It's a simple thing to do. The win is big, sometimes as much as 1,000x or 10,000x. And, in those cases, it's worth rewriting your app. No matter if you're HPC or if you're cloud, or if you're enterprise, if you can get 1,000x or 100x, it's worth it. If it's 2x, you're probably not going to do it, right?

33:55 GG: And the same thing goes for computational storage, offloads for bandwidth, not just key value. You're not going to bother changing your code a little bit to be able to take advantage of offloaded erasure or offloaded compression, or the like, unless there's a big enough win there. Because really the whole cost of all this is the applications. We probably have $10 billion in our applications at Los Alamos alone, and we certainly don't spend that much on IT. And, so, I think your question really comes down to "Who's going to change their applications and to what?" And the answer to that is economic, it's "where do I get the win?"

34:38 GG: I think one of the big things that I see as being problematic in all this is you've got processing everywhere, as was mentioned by Scott, I think it was, where there's DPUs and there's CPUs, and there's GPUs, and there's FPGAs and there's ASICs all over the place. And we've been living in this world for a while, but only really in the compute area, not in the storage area. We've had compute in the network for a number of years. The question really for us was, "How do you contemplate programming in such an environment like that that makes use of all that stuff? How does a programmer contemplate?" For us, it's horrible. They've got to contemplate million-way parallelism, and they've got to contemplate multiple tiers of memory. And now we're telling them that they got to contemplate heterogeneous at about eight different kinds of processors and use heuristics to try to figure out where to put what kind of processing, and how do you even contemplate thinking about that problem?

35:34 GG: And right now, what's happening, I think, is the standards bodies are off having examples written for them by people like me and people like Amazon and others, that are saying, "Let's push the standard a little bit in this way, let's push the standard a little bit in that way." By writing up an example and showing them, "Look, there's 1,000x-fold win here, this is great." But we're not really to the point of thinking about how do you look at the whole thing, how do you program a beast that's got all those parts in it? And there's some good research going on at universities. There's a thing going on at Carnegie Mellon on exactly this, how do you program in a completely, totally heterogeneous way.

36:17 GG: IBM had done some interesting work long ago for NSA that finance people picked up on a product they call InfoSphere. It's essentially used by the finance industry. They build graphs of processors, and they put a little bit of processing here and a little bit of processing there, and they've funneled the data through this graph to try to figure out what's going on. But all that stuff is still just researching from a super-heterogeneous way. What I think is we really need to spend some time and effort thinking through how we're going to do it from a larger point of view. There's good academic work, and I think that's the last thing that will come from standards. We got to walk before we run, and that's the Holy Grail, is how do you program a beast that's got all that stuff in it and millions of them at the same time, and that sounds hard to me. I'll probably retire before we get there.

37:13 JM: Somebody wrote into the chat that you're right on, and you got a fan, obviously. We are running out of time, but I do want to pick up on one thing and ask Dave and Dejan to chime in in the few seconds that we have left. Gary mentioned computational storage and the promise that it holds here. Could that be the lingua franca that we use to try to bridge some of these elements? Is that an area where we can focus on as a potential avenue?

37:47 DE: Yeah, I think that's a mid-term gain, very similar to the DPU. The idea of the DPU, as I described it, was taking some intelligence and pushing it to a different place. Computational storage is that same idea of taking that intelligence and some hardware acceleration and pushing it, in this case, into the storage unit, in some cases the SSD controller itself. I think computational storage is still trying to find its footing. It hasn't had the commercial success yet, but it shows a lot of promise. I think some of the startup companies are unlikely to get there, but some of the larger companies that pick up these ideas and have more of the long-term view, that's where it'll go to be successful. Dejan, what do you think?

38:28 DK: Yeah, I agree. And I think computational storage, in addition to other technologies we talked about today, can help to drive some benefits. Even if we get few percent of benefits using computational storage in terms of data reduction, this few percent could multiply many, many times, depending on the amount of data. So, little gains on different parts of the system can result in the bigger overall gain for the entire system. I definitely think that computational storage is going to play a big role in the future just because of the nature of technology. But, also, what is important in the AI/machine learning/HPC world in general is also latency and throughput. I strongly believe that the new generation of high-speed, high-throughput, low-latency interconnect is going to play a vital role in AI/machine learning workloads.

39:23 JM: Well, I have to get a little plug in for SNIA because we are working on the computational storage technical work in trying to identify the proper use cases that can be developed both inside of SNIA as well as other organizations like NVM Express.

39:42 DE: And, J, let me do a quick shout out to SNIA Persistent Memory Summit is going to add computational storage for next year. I think that's coming up in April. And it's pretty interesting to see these two technologies get combined and some of the solutions that can come out of it.

39:54 JM: Absolutely. It's actually quite exciting. And, so, I want to encourage people to reach out and ask for more questions. One of the things we don't have, by the way, is a very strong HPC component in that working group for computational storage. I'd like to encourage somebody -- somebody perhaps on the panel or somebody perhaps who knows somebody on the panel -- to consider contributing or participating at some level, in some fashion, to computational storage for HPC. Because it is a notable gap, joking aside, it is a notable absence in the conversation. Last chance, any additional comments or questions or points of view that you'd like to share, gentlemen?

40:48 DK: I just wanted to say one thing that kind of debate, that AI overall, high-performance computing, it could change direction rapidly, in a sense that if one of the technologies proves to be more useful than others, then everybody may jump on to that bandwagon in that sense. Since we are still early in development, it's a little bit unpredictable, so it's one of those things that remains to be seen what's going to happen. But there's a few different areas that show problems that we covered today, so we'll see which direction things are going to go.

41:29 DE: Yeah. To build real quickly on top of something Dejan just said, one of the things that caught my eye was a chart from Marvell, which showed that AI model complexity has gone 50x in 18 months. And they compared it to transistors, and that same 50x gain took 10 years. It means we can't throw hardware at this. We have to rethink the architecture, it has to involve software.

41:54 JM: Pretty soon you're going to run out of nanometers. [chuckle] OK. Gentlemen, thank you so much for your attendance at the Flash Memory Summit this year. It was incredibly enlightening and, in some cases, quite validating for some of the things that I've been thinking about for several years, so thank you so much for attending and joining us today.

Dig Deeper on Flash memory

SearchDisasterRecovery
SearchDataBackup
SearchConvergedInfrastructure
Close