104


It's bright. Well thank you Frank. I'd like to take a minute to thank Charles and Frank from MemVerge for putting this track together. I know it takes a lot of work. They're very open and generous with their time and resources, not only at OCP event but others as well, FMS and supercomputing. So thank you guys.

I'm just going to start out with a short infomercial for SMART. We've been around for 35 years. We've got 3,700 employees worldwide, two major factories, one here in the Bay Area and the other one in Penang. We've got multiple teams of engineers working on many different kinds of products, flash and memory types of products. And we support many different types of applications and customers.

We have three different business units. I'm in the memory solutions business unit. We also have the intelligent platform solutions which consists of Penguin, Stratus, and also Inforce, and Artisan Embedded, which aren't shown on there. And the third group is Cree LED, and they build specialized LED solutions for stadiums and so forth. So these segments and business units are focused on specialty markets and providing unique solutions and solving customers' problems.

All right. So to get to this point, now with CXL and all the discussions and engineering focus, it's been a long path. These products are all products that we've developed. I'll start at the top there. We have Optane add-in card using Optane DIMM, so that was up to two terabytes. On the right is a CCIX E1.L module. So that took quite a bit of development work with CCIX and also belonging to all these consortiums and groups. The one on the right is a Gen Z. We had three generations of the Gen Z product. We had high hopes for Gen Z and put a lot of effort into that. And thanks to everybody else who developed the standards for Gen Z. It did look very promising. On the bottom, those three are the NVDIMM. We had a DDR3 NVDIMM, a DDR4 NVDIMM, and we started on a DDR5 NVDIMM, but that's actually now moving into CXL. So it's all coming together with CXL, and I see a lot of momentum in the industry. It's really nice to see this doesn't come easy. And the one on the top left is a DDIMM. That's with OpenCAPI. So we're building the DDR4 DDIMMs and DDR5 DDIMMs in three different form factors. All of that takes a lot of work. So the complexity of the memory is just increasing as you go from parallel attached to serial attached memory with either controllers or FPGAs on these memory modules. It's very complex. The testing required takes a lot of know-how. So I think having this experience for the last 10 years and more really helps us to get to where we are now.

We've heard a lot of detail about the applications, so I'm not going to go too much into that. So AI and machine learning, the large databases that are needed for these data sets for imaging and facial recognition and so forth, just larger and larger data sets. Same thing on the right with in-memory databases. You have online transaction processing that require just more and more memory. And what we're seeing now is a transition to CXL and the module shown there, you've got several different types of adding cards and also the E1, the pluggable modules, the E3.S 2T.

All right. So this is part of the problem statement. We've seen this in many versions already. This is an example of a Sapphire Rapids processor. It has more than 60, up to 60 cores. It has four memory controllers. And the challenge here is accessing the memory. And also the pin count, the more pins you need to increase the number of memory controllers, it all adds cost, it adds power, it adds the size of the chip, et cetera. It makes the board more complex. So adding CXL to be able to access that memory quicker helps solve the memory bandwidth and the wall problem. The other thing here is there was a study published in May by the Georgia Institute of Technology, and it talked about memory controller queuing. And they went as far as to do some modeling about removing the DIMMs completely from the board and just attaching the memory via CXL because you have so much data that's trying to access the memory controllers, it's got a bottleneck. And if you replace the memory with the CXL, you'll actually get higher performance because you're eliminating the queuing time of the DRAM. So I thought that was very interesting. I recommend take a look at that study. But in any case, for today's systems, we're combining onboard main memory DRAM and also a CXL-attached add-in memory.

The other problem statement, the disaggregation. So you have low utilization when the memory is not being accessed by that particular system from that CPU. Also, you have no other systems can access that memory. So disaggregated or stranded memory, CXL 3.0 is going to solve that problem and should see a dramatic increase in the adoption as we migrate to that standard.

And various versions of this were shown as well. What I wanted to point out was you have the spec being developed, and then you have the adoption behind that. So by the time we get to CXL 3.0 in 2026, we'll have more widespread adoption of CXL. However, it takes a lot of work to get there, a lot of iteration, a lot of system integration work, and just a lot of detail engineering focus to get to where we are today. And going into 2024 with adopting CXL 2.0. So it's very exciting to be part of this.

So one product I'll show here, that's the E3.S.2T. This is running in XConn's booth. You can see the performance of this. But the theoretical performance, we estimate it to be at about 200 nanoseconds. And the testing results came in at about 212.8 nanoseconds. So very close to the theoretical expectation for the performance. And just 100 nanoseconds slower than DRAM. So one new mahop difference in the performance of that. And then on the bottom, just talking about the bandwidth. So we're expected about 63 gigabytes per second of bandwidth. Came in at about 36. And the only reason for that is just an earlier revision of the controller that's being used on this particular card. So we expect that to get to 64 gigabytes per second with the next generation.

And the card that Frank talked about, which is right here. This is a dual width add-in card. It's got eight DIMM sockets on it. It's got two x16 CXL controllers on it. And the whole idea is to dramatically expand memory capacity up to four terabytes. But ideally, the TCO value of this is not using that much memory, but using rather 64 gigabyte RDIMMs, avoiding the use of 3DS components because of the high cost. So you're using essentially eight 64 gig RDIMMs in this and then eight 64 gig RDIMMs on your main board. So you get to a one terabyte system with the CXL enablement. So you're also can avoid the use of a second CPU as well in the cases of you need a one terabyte system. The other advantage of this card is, as Jeff pointed out as well, the DDR4. There's I think much more than 100 million DDR4 DIMMs in use today. So DDR4 can be reused in the DDR5 systems that are being deployed. They just need to be tested and verified and then plugged into these cards. And you avoid-- I mean, in terms of green being efficient and green advocacy, then it makes a good argument for reusing DDR4 DIMMs and DDR5 systems using this type of a card.

All right, so my last slide here. So just as I mentioned, it takes a lot of expertise to put this memory together. It's just not a DIMM anymore. It's a very complex type of product with a CXL controller on it. There's many features. There's variations in the features. There's a lot of testing and validation work that goes on behind the scenes. So a lot of that takes work to put together. I would say the call to action here would be to take a look at the demo we have in the XConn booth. We have some literature and samples outside. I also am co-chair of the SNIA Persistent Memory Special Interest Group. And there'll be NV versions of these products in the near future, in the beginning of next year. We want to enable and collaborate with partners within the ecosystem to launch NV-enabled CXL solutions and products in the marketplace. So thank you.