215


My name is Meow Yee. I'm the president of the Open Power Foundation. I have another hat that I wear. I'm also a director in the IBM Power Systems Division. But today I'm on stage to talk about this project. You'll see on this screen here the players that are collaborating on this project. It was launched by the Open Power Foundation. Let me talk about the foundation first. This particular non-profit organization was launched with several players, including IBM,  to build and facilitate an open ecosystem around the power architecture. What we have launched here is a collaboration with several players to build the first ever  high performing server grade system utilizing the DCMHS open industry standards, targeting  the hyperscalers and edge use cases. Jabil, and you can hear more about that from Todd. Jabil is one of the collaborators here. They are an ODM. And this particular project will validate their multi-architecture design capabilities. Raptor Computing Systems is not here today, but you'll hear about what they're contributing  in this particular project. They are a fully open, fully owner controlled open system and blob free hardware system  that they provide to their clients. And this particular project will highlight and showcase their open source DC-MHS solution. They're also, as part of a member of the Open Power Foundation, is developing the next generation  of, well, a generation of power processor using the Power ISO 3.1. And I actually kind of failed to mention that the Open Power Foundation actually has assets  that have been contributed by IBM. It's a fully, it's not just IBM. Other players have also helped to contribute to develop more open assets around the power  architecture. So we have open hardware, open software, open firmware, and down the lowest level you have  the Power ISO, which is the instruction set architecture. And IBM granted associated patents royalty free for the open community to use. So Raptor is utilizing that to build a processor for their business. And last but not least is Wooden Data Center, a pioneer in highly sustainable data center  solutions. Karl Rabe provided a germ of the idea that kind of launched this project. And at the bottom you see SAP. Ali Pahlevan from the SAP Innovation Center actually provides a perspective from the end  user to this group that is collaborating. And then Todd will talk all about what comes together as this solution.

 All right. Thank you, Meow. Yeah, my name is Todd Rosedahl. I work for Jabil. And we are a company that has a DCMHS inspired chassis, which is kind of our part in this. So Meow gave you a good overview of the players, how we were collaborating together. But what were we doing? What was the point of it? The initial motivation was somebody came to Karl from the Wooden Data Center and said,  hey, we want you guys to try to make a really large memory system. And there are a lot of use cases for this. So we all have heard about ML and AI and how much memory you need. Granted, a lot of that's done with HBM and with GPUs and ARM cores talking to it. But there are use cases as well for a single CPU with just a massive amount of memory and  being able to use that for inferencing. In-memory databases such as SAP HANA, they want as much memory as they can get. They don't care about the compute. How much memory can you pack in this server? Because they need a massive database. And then there's some new things coming with virtual quantum computing that are also requiring  a lot of memory. So we said, all right, we have that motivation.

So our goal as a group was to develop this large memory server, we're calling it. We want to try to pack as much memory as we can. We thought we can get 32 terabytes maybe in a 19-inch sled. And we'll see how that worked out here as we go. We realized we can't use direct-attached DDR. You're just not going to be able to get that much memory in there with direct-attached  DDR. So we needed to utilize some of these new memory technologies. And I'll talk about those. I have a slide on it. We have OMI and CXL are a couple of those new technologies. We wanted to use open standards to give us a leg up on having-- not having designed this  from scratch. It's one of the reasons why we use that MHS chassis in Raptor. Computing was going to do the SCM and HBM. And they would just use our chassis, just try to get this thing going as fast as we  can and use everything open source. Open source-- we had open standards. And all open source firmware was the goal.

A little bit about MHS--  if you want to hear more about this and understand it better, there's a couple talks tomorrow  by Rob Nance, who also works with me at Jabil. He'll get into that into more detail. But as an overview, it's an initiative within OCP to create modular chunks of the server  to make it much easier for everybody to develop the server. If you can just take-- for instance, we'll focus on the HBM and the SCM. So the HBM has the processors and the memory. And then the SCM is a security control module. It has the BMC, root of trust, and all of your out-of-band I/O. There's a defined interface between those two entities. And so you can just take an--  the goal would be take an off-the-shelf HBM or SCM and stick it in the chassis. And then it should run. And there's a lot of work ongoing in the industry to make that happen. But what we did is took an existing chassis. And Raptor is going to take that. And they're going to make an HBM and SCM and stick it in and try to make this work.

Here's a picture of that exploded with the--  see if my pointer works. Not very well. All right, so you can see the various components. There are a few board form factors that can be used with MHS. This is the full-width board, all the way wide. We want to get as much memory on there as we can. So that's what we used. There are smaller boards as well that are form factors that can be used. We didn't choose those for this project. And you can see now with the direct-attached DDR,  you can get up to 24 DIMMs. And we picked 256 gigabytes for this. It's a better price point. You can go up to 512. But in this case, you're saying, you can get 6 terabytes. And that's pretty good. But you have to do better than that. At least our goal was to do better than that.

So then here's an OMI CXL overview. At the top, you've got, again, what it looks like when you just  have DDR attached. You run out of beachfront on your CPU. You can't get any more DIMMs. But you can do 12. You can get that 256. Or you can put a serial buffer in there,  which is kind of what both OMI and CXL do. You put a memory buffer in that's a serial attached buffer to your CPU. And then behind that, you have your standard DDR 4 or 5,  whatever you want. It does add a little bit of latency. Both of these would do that. But it's not much, about 4 nanoseconds. And the OMI supported--  it started out as OpenCAPI memory interface. And then it just got changed to Open Intermemory Interface. That standard is now underneath CXL as a spec. So they're kind of all together. But they're definitely different technologies. OMI is used only for this use case right here, basically,  memory expansion. CXL, if you've heard of it, has a number of use cases. One of them is this. It also can be used for memory sharing. It can be used for switches. But one of the things it can do is this sort of memory expansion  design.

So what would it look like if we could pack all this memory  on the HBM? Well, you'd have up to eight OMI channels for CPU. And then we have eight DDR chips behind that. So you're packing a lot of memory on there, 16 terabytes. That's on an SCM. That CPU could be a DCM. And if it was, then you'd be able to double that by using 512 gig  DIMMs. So you could get--  in theory, you could get 32 terabytes in here. But the question is, can you fit it? Can you power it? Can you cool it?

So this is a pretty fresh chart. Just had this layout done. And I think we can optimize this. But you can see the six--  in this case, we have six memory buffers. And each of them has four DDRs next to it. So you get a total of 24 in this particular layout. That gives you, with 512 gig DIMMs,  you're getting the 12 terabytes. It still is enabling--  you can see the slots up front. It's enabling your PCIe slots still,  which you may or may not want to do, depending on what  your use case is. But now that's not enough. And so another standard you can use,  another way you could do this, is  you could run those channels off of this board  onto that E3S expansion base in the front  and make memory--  basically, memory modules there instead of SSD,  and be able to then pack the 16 terabytes in this one U. And if you wanted to pair this coherently with another one  above it, you could do that to get 32 terabytes in a two U.

This is what it looks like in cartoon form. You've got your three buffers up top. And then you're driving that OMI link down onto that E3.S,  normally hard disk drive bay. But now you'd be using that for memory,  again, getting up to 16, potentially 32 terabytes.

So it can be powered. And we're now talking about powering in the existing  chassis, the one that is going to be available on OCP  Marketplace. And that, it can, but not redundantly. If you look at this, notice there's only 100 watts  with the power in the CPU. It's because we really don't need it  for one of these use cases. We don't really need a high power CPU. Just want that memory, something to feed it slowly  for an SAP HANA instance. If you wanted a higher power CPU, there's headroom. At 1541 watts, got to stay under 2000 for the PCB. Not redundant, again, 1,300 watt supplies. You'd have to do something else if you wanted redundancy. So you can definitely power 16 terabytes,  even if you added in PCI and SAS data.

But can you cool it? Cool it is a bigger challenge here. And I'm not going to run you through all the numbers. But you can see that we can cool 12 terabytes. But that's with no further optimizations. If you used 512 gigabyte DIMMs, you'd  get better performance per watt. But obviously, price is a factor. Other things can be done, I think, to optimize. This is sort of a back of the napkin kind of calculation. But on that calculation, I'm pretty sure  that with a 2U, you would be able to cool  the full 16 terabytes. And you've heard a lot of liquid today, right? Liquid, liquid, liquid. If you moved to liquid, that would help here as well. And you wouldn't even need it here  as bad as you do when you get those Grace Blackwell  GPUs coming, which was one of the pitches prior.

So what's the conclusion? We think in order to get this massive memory,  you're going to need these new memory technologies, OMI  or CXL, whichever is--  take your pick. There's a pretty good article out there,  if you look at that link. And it talks about the difference between HBM, DDR,  and OMI, and what are the pros and cons,  and which way would you want to go when  you're designing a memory system. And so I would say read that if you're interested. And also, the open standards. We think that MHS, we're a big believer in that. We think it's going to help the rate and pace of all these  server development. I think we are going to be able to stand this up fairly  quickly, because we have the standards. We have the chassis. When you go to develop an HBM and SCM,  it's all defined, all your interfaces. You don't have to develop any of that from scratch. Just follow the standard.

 So the call to action. If you want to be involved in projects like this,  contact Meow from the Open Power Foundation. We've got a lot of these things running,  and you can be involved. You can get involved with the OCP, the MHS workgroup,  if you're interested in that piece. What we need to help defining all those pieces. It's not done. There is a lot of work to be done there, actually. If you go to the OCP marketplace soon,  you'll be able to see a link to that system  that I was showing you from Jabil. You could buy it and play around with it. And if you want more information on Raptor,  as Meow was talking about Raptor, they're  the ones that are doing the actual HBM and SCM. They have some open source solutions as well out there  that you can explore. And that is it. Any questions?
 
Thank you.

Two questions. One is, it actually looks like if you  could manage the power states of memory,  you could actually get n plus 1 redundancy. Had you ever looked at that?

We didn't really look at that. It was like the back of the napkin. I think you're probably right. If you'd be able to go into lower power states,  that was just an average, here's the kind of power  that we assess that that memory is going to take,  sort of at all out.

So the next question was really maybe to the Open Power  Foundation question. And I think the Open Power Foundation  has been on this journey to completely open source  the platform from the ISA to the system. And I think it's been about, what, 2 and 1/2,  3 years since you started on that journey. So this is a huge step forward. I remember it was a year and a half ago  when the security control module was open sourced  with some parts and things. So it kind of seems like you're getting close to the end goal  of having a completely open source  platform, so my question is, what's left? Or maybe what's next? Because it seems like everything,  all the ingredients have finally gotten to that point.

Yeah, I mean, the goal is to have, you're right,  a completely open source platform. And what's next is encourage other people to build,  adopt and build these systems. So certainly, Raptor is one company,  but what we'd like to see is a big ecosystem of people  taking off and making large memory systems like this one. I think that's a big thing. Making large memory systems like this one,  if they have a use case for it, and doing it in the open  and building and contributing. That's what we want to have happen.

So just a suggestion, you heard at one of the keynotes  this morning the work that's going on with the Barcelona  Supercomputing Lab. And actually, in our rehearsals yesterday,  we heard that the Universal Baseboard compatibility  has become an issue for them that they  uncovered very recently. And so as you--  I know Jabil's working on the modular architecture,  the MHS architecture. So something to think about is how do we,  as a collaborative organization, try  to get in front of that interoperability  so we really have a really strong ecosystem?

Yeah, so let me just jump in here to say one,  to answer your question. We've got-- in initial stages, we  had these open assets that were part of the Open Power  Foundation. Members were leveraging the asset  and building their own products on top of it. This is a rather really unique project  in that we've gotten several players together  to come together to build a solution with components that  come from different players, but they can be all slotted in. And what Todd is saying is that certainly there  could be another player who wants  to build an HBM that could also slot in there. It does not-- I mean, Raptor is certainly fully invested  in getting this one done because they're very much subscribing  to the full open system stack. So yeah, and we are looking at other opportunities. We are Open Power Foundation collaborating  with OCP in terms of leveraging their standards. So this is-- the more we come together,  I think the faster we can build these solutions  for the general community.

 It's nice to see the progress that you guys have made.

Yeah, thank you. We should talk offline about the Barcelona thing,  because it would be good to get, like you say,  get ahead of that. Because if you can use MHS hardware,  then you can just swap pieces in and out. You don't have to take and throw away your entire chassis,  for instance. You just update your compute. Is that where you're going with it?
 
Or--  And we're also--  We also have members that are building new processes using  the Power ISA. So that's right down to the chip level. That's another area.