-
Notifications
You must be signed in to change notification settings - Fork 0
/
Copy path364
106 lines (53 loc) · 35.1 KB
/
364
1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
17
18
19
20
21
22
23
24
25
26
27
28
29
30
31
32
33
34
35
36
37
38
39
40
41
42
43
44
45
46
47
48
49
50
51
52
53
54
55
56
57
58
59
60
61
62
63
64
65
66
67
68
69
70
71
72
73
74
75
76
77
78
79
80
81
82
83
84
85
86
87
88
89
90
91
92
93
94
95
96
97
98
99
100
101
102
103
104
105
106
Well, thanks for being here. I know it's late, getting towards the end of OCP, but out here we're going to talk to this great panel about shaping the future of memory and interconnects for AI. So my goal is, we have a few questions, but I'd really like to have some time for the audience also to ask some questions. So we'll try and have a few questions, and have some thoughts from the panel, and then hopefully we make this an interactive session. But, yeah, we can start. Maybe, you know, today we have a pretty good gathering of panelists out here across the ecosystem. We have Samir Rajadnya from Microsoft, and then Manoj Wadekar from Meta. So we have the hyperscalers representing the folks actually driving the usages and use cases in the public cloud and even private cloud at hyperscale. We have Reddy out here from Intel Corporation building the compute devices required for AI. And then we have Siamak from Samsung building the memory devices. So pretty much right out here we have a full stack solution. So we figured this would be a good intersection to have this discussion. So maybe, you know, jumping right in, I guess the first question I have is why is this topic about interconnects and memory important to discuss now? What's driving this topic and the urgency now? Maybe I'll open this up to Samir.
Yeah, as we have seen throughout this morning, we are talking about the secondary memory, right? We are not talking about HBM here, adding secondary memory for AI. And the motivation is HBM, there is a limitation to how much HBM we can add. So how do we add this secondary memory, right? And that's where interconnect comes. And, you know, there are two options this morning I talked about: have the memory closer to the compute die or maybe not. Make it away from the compute die. So when you have to take it a little bit away from the compute die, and the motivation for that is you literally need physical space to put all these DIMMs or LPCAMMs, right? So if you need physical space, you have to move a little bit away from the compute die, and that's where interconnect is important. And then, you know, power is important in a data center, and every time you talk about interconnect, you're burning power. So which is the right interconnect, right? So it's a trade-off anyway, right? So in order to add more memory, if you're burning more power, that's not a good solution. So we have to find your sweet spot, and then you have to also look at what is available today, right? We just don't want to keep looking in the future what's available today. So there are a few things that are available today. We can use those. So, yeah, I think that's the motivation why we need interconnects because it's solving the memory problem here.
Great! Thanks. Manoj, did you want to add something to that?
Sure. I think maybe adding a little bit to what Samir said and maybe zooming it a little bit out. If I look at most of the AI infrastructure, our compute need continues to grow, which basically means we are making larger and larger clusters. Each accelerator is getting larger and larger. The amount of memory that is required is getting bigger. So we have two scales of problems. One is basically how to create a larger cluster, which interestingly is making tighter. Because if I think of a cluster that I'm trying to do at a scale-up level, where the GPUs have to interact very closely, the amount of bandwidth and the latency constraints that we have essentially keeps on bringing these compute units, GPUs, much tighter. So we have one problem on that level. On the other hand, actually, as we start doing that, you see that each accelerator is also getting more complicated. They are becoming denser. GB200 is all the excitement in the Expo Hall. If you look at it there also, you are integrating more things. So how interconnects inside the chip are working with the chiplet community growing is also important. From memory perspective, what Samir said, HBMs, also how they interact is an important thing. But the tiering that will happen outside is an important thing. Interconnect for all of these accelerators working together in a smaller, tighter space of this what we call as a rack or a pod or a small, tight cluster, whatever you want to call it, is going to be important and this interconnects whether it is die-to-die interconnects whether it is scale-up interconnect with copper whether it is scale-up interconnect with Optical all of these are relevant and memory plays a role where it says like Samir said, whether it is close to or inside a chip or whether it's close to the chip or whether it is a little bit away from the chip, all of these are important problems and they are very tightly connected. So I think it is very important for us to talk about it. And the reason why we need to talk about it now, even though you may not see deployment for all of it today, there are cycles for AI design changes or upgrades is shortening. You know, what we had on the CPU on a regular upgrade cycle in our data centers for GPUs or accelerators or the AI systems is becoming narrower. So the more we talk about it earlier, the solutions that we bring it out in the open and discuss, the better it's going to be for all of us. So I think this is why it is a crucial topic to talk about right now.
No, that's great. Thanks. I guess one other thing I wanted to just follow up on that is, of course, all of you in your various capacities, not just at your companies at hyperscale or processor or memory, driving the thought leadership, but then also helping shape the thought process and then the collective intelligence here at the OCP community. I want to ask, maybe I'll open this question to Reddy, is what fundamentally is it about AI that changes the system interconnects in addition to what we heard about, of course, power and what Samir said about needing like a second level of interconnects? Is there something fundamentally different about AI workloads which is driving us to this point?
Yeah. Yeah. So if you look at the general purpose compute workloads, typically, you know, you know, scale-out type of services, the bandwidth requirements for communication among the servers is very low. Most of the bandwidth requirements are typically, tend to be control plane type of activity. Unless you're looking at like scale-out storage backends, scale-out, you know, big data type of backends, typically the bandwidth requirements are not that high, not that demanding. Most of the time, the bandwidth requirements for memory are actually serviced by the, uh, memory within the compute node itself. Um, but if you look at the AI, um, you know, the training type of workloads, you essentially have hundreds of GPUs working together to be able to actually execute a training job. Um, whether it is tensor parallelism, pipeline parallelism, data parallelism, irrespective of whatever hybrid combination of parallelism techniques you are going to use, they're all working together to actually, you know, to actually execute one specific job. Um, and in that context, to be able to actually communicate with each other, you have to actually communicate, um, over the high bandwidth interconnect, um, primarily because you're passing a lot of data back and forth. Um, and the latency plays a big role. Um, the throughput of interconnect also plays a big role. The local bandwidth on the GPU itself from a memory perspective plays a big role. So you essentially have to have high bandwidth, uh, memory on the GPU, high throughput interconnect, high bandwidth interconnect, uh, to be able to actually execute the job in time, um, you know, to be able to actually do this work. That's the reason why, um, memory and interconnect actually is a significant, you know, focus in the CMS work group.
Uh, Siamak, would you like to elaborate on that, maybe?
Uh, sure, um, we touched on the word big data. If you remember 15 years ago, uh, big data was big. Um, and that, the data came from mostly sample natural phenomena. And for that, storage was needed. And now we're talking about processing on that data. Not only samples from natural phenomena from people, from their observations, from their videos, but now we have a lot of machine-generated data. So if you think about, um, a phenomenon whose rate of change is a function of its own value at the time, that's the definition of exponential growth. So that's why we have this exponential growth in data first, and then execution. And as you execute it, you come to insights, and the machine-generated data keeps, it keeps, it keeps coming in. So therefore, we need fresh data. And how to bring fresh data is with bandwidth. It comes from outside. What is outside? Outside is networking. Outside might be storage. It comes inside the server. It needs to be stored somewhere before it gets processed. And that's where memory is. So, that's perhaps the reason that we are in this inflection point with, uh, AI.
Got it. Uh, Samir or Manoj?
Just to add one point. Right. We have seen our world before AI, we are all kind of PCIe and Ethernet, and PCIe moved very, very slow. Meaning, in today's context, if you look at PCIe, right? Gen 3 to gen 4 took many years, gen 5, but now AI is all about bandwidth, AI is all about bandwidth, and how we can solve right? So there will be new names, UALink is there, but fundamentally, everybody is running behind bandwidth, and and and that is why, and why bandwidth because we are moving memory, memory data from or that bandwidth, right? So that is why memory and interconnect are so related, and we have to drive innovation on the bandwidth side, and the media also matters whether it's copper or optical, so that is why we have to keep discussing about these things, and these are highly complex things. Manoj talked about reliability this morning, right? So you know these things are not going to happen if one year is a cadence for new technologies, so that is why through CMS we have to keep engaging and focusing on this thing.
That's that's a great thing, so maybe that's a great segue into the next question. So you mentioned, all so there's this alphabet soup of standards, right? UCIe and CXL? The last year was all about ultra Ethernet, now there's UAL. So, I mean, what's what's going on? Do we need all of these, or is like one size can fit all? And anyone?
So, if you look at the ecosystem, right, closed ecosystems in the A space, right? That's the fact, right there is definite value in vertical integration. So, but at the same time, the economics of multi-sourcing and all those things are important in the long run, so that is why standards are important. Whereas components from company A can work with components of company B, and so that's why all these standards are coming out. But fundamentally, everybody is chasing bandwidth, scale up, and scale out. Here, heard it many times here. So, yeah, how it will play out, we have to keep looking at all, and we also have to be realistic: what's possible today, what's possible in three years, what's possible in seven years.
Yeah, I think it's important to understand the—we do—I mean, we should expect lots of open innovations. Compute Express Link is one of them. Now, you are also looking at UALink, but we also have to look at from the perspective of various—which one is being used for what and why? There is actually a lot of, you know, focus towards you know, certain activities, right? Like UALink, as an example, UALink is primarily targeted for scale-up AI-centric workloads, whether it is training, inference, whatever it is, right? Whereas CXL is being targeted as memory expansion capability as a second-tier memory expansion primarily. Whether it—how close—how far it is, really depends on the workloads, right? So we are essentially looking at there as more of a complementary solution, cars opposed to one competing with the other, and we do need both.
Anyhow, the way to look at it is people, programmers. First, you know how awful it is their word on some software, you want to take advantage of it. If the value of new technologies is so high, you might be inclined to change your software, but if the value is just incremental, you don't want to change your software. So how does that apply? As as Reddy said, CXL is really optimized for people who don't want to change their software. They rely on load store semantics; they rely on the hardware doing the coherence for it. They just want more of it, so more memory, more bandwidth, more things, but I don't want to change my software, so CXL is very good for that. Now, on the other hand, and that is the model similar to, hey, you wake up in the morning, you want to have eggs for your breakfast. You go to the refrigerator, you have eggs, you eat eggs, you're happy, it's just a routine that you have, but then AI comes and AI is well, I can optimize certain things. I am okay to change my software to move data ahead of use. I would like to move bulk data. I need more bandwidth, and things are regular, things are not too random. I know exactly what happens if I have a lot of people showing up here at OCP Summit at eight o'clock in the morning on Tuesday, I know ahead of time to go order breakfast for everybody, but they're going to get the the exact breakfast I give them, not their choice. That's exactly what they give them. So, for that software changes, you order Amazon or equivalent to deliver things ahead of time. It needs to be stored in some memory, and downstairs in a lobby, and then you go there and sit down and eat, and that's the new program model for AI.
That's a great analogy, it's making me hungry now, but, so, since we are at OCP and there's a lot of folks out here now, I think this here the OCP attendance has maybe doubled, just eyeballing it. So my question is: how is OCP actively addressing all of these items you mentioned about AI? And there's all these alphabet soup of standards coming up, there's these new interconnects and fabrics. So, what is actually how is OCP addressing and helping as a community make sense of these things, break them down, and provide some direction for the collective industry to actually make some meaningful progress?
First of all, I think this is a new problem that actually everybody got into for a few years. I mean, the problem has been there, but the demand for it has been really phenomenal. So I think it always starts with somebody bringing out a great solution. Few people start bringing out great solutions. But I think the scale of the problem and the pace of the problem is such that that it really requires a participation from the whole industry to bring the innovations, bring to the solutions. That's one aspect. Second thing is basically the reliability that we talked about. AI systems, the way they work, of course, single node or single component failing can bring down the job that is running for weeks and days and months. So you really want to have reliability as a part of the overall solution. More people working on these things in an interoperable fashion has a better chance of having significantly reliable systems. So I think all of these things are important, and OCP actually is a perfect place for this because that brings out the open hardware discussions in various fashions, various groups. We are, in this group, of course, we are focusing on the composable memory systems problem, which includes what is supported in CPUs, what is supported in accelerators, what is supported in the memory systems, what are the interconnects that we have. And of course, AI has a lot more complex problems that are at a very high level at open systems, but there's a new initiative that OCP has kicked off for bringing those discussions together. But our group stays focused on that, where we are discussing the technologies that not only are there for adoption today. We see the products in Innovation Village today that we talked about two years back saying that how CXL can enable these tiered systems or the composable system. Now we see in the lab, Seagate has a device which shows the elephants has a memory solution, Astra labs talked about their own solution. So we have a bunch of these solutions right now being demonstrated. This is a key achievement for OCP to talk about them. All of them are running consistent benchmarks which are important for us. We have talked about what use cases we have and what benchmarks we will use it with. So I think this is what OCP brings to the table and we want to talk about those things. Today we heard about many things that are actually futuristic, near memory compute, the new interconnects. Those things are important that will happen in the future, but that future is not too far for AI systems. So I think OCP's goal is to really be open from the use case perspective, benchmarks perspective, potential solutions, how people can collaborate with each other. Even though OCP does not define the standards themselves, OCP actually facilitates a lot of dialogue in open that allows us to go out in the standards bodies and have discussion. CXL standard will be defined in the CXL body, but we talk about what we want the solution to be. JEDEC does that for memory. So I think OCP becomes this really open platform that we are collaborating with.
Yeah, I mean, I will take a simple example, right? So if you take a look at CXL, you essentially have the CXL consortium is defining the standards, but you do need device vendors to actually go and build the CXL memory buffer. And you do need a certain element of standardization or the actual conformance to specs and defining the specs on the design that includes thermal power, validation, mechanical, everything, right? So you essentially have, let's say, a memory appliance based on CXL standards. Now, what do you do with it? You essentially have a memory appliance. You can actually showcase on the floor, but you do need an element of software that actually runs on top of that. If you have an element of software running on top of that, it's not going to work if it is only data plane. You do need a control plane. So there is an element of control plane. There is an element of data plane software stack running on top of the hardware that is actually defined through OCP specifications and designed through OCP specifications. But even if you have all of that, you need to be able to say, how am I going to characterize my system? You need workloads. How am I going to benchmark them? You need benchmarks. How am I going to actually measure different components? You do need standard metrics. And then, of course, what are you going to use it for? You do need use cases. All these things actually coming together is a big deal. And I see that as OCP is right in the middle of it.
I'll just add one more point here. Standard bodies are doing good, but they're focused only on standards. Many times, they are not focused on use cases, products. So they're doing a great job on protocols. When we started working on the OCP CMS, and when we are doing our day jobs, when we engage with the vendors, and if I tell them that there is this one nice feature I would like you to consider, they are not going to listen to me on day one. Because silicon companies, we know the business. They have to invest a lot of money for any new change. So this is where the collective wisdom comes together. When we participate through OCP, we talk about use cases, because that is what they want to hear. So we bring that on the table. And when we collectively, again, talk about use cases, that also helps company to company collaboration. So this is a platform where when everybody talks about the same thing, and the value comes out, then the vendors are going to put that feature. So that's a great thing about OCP. Standards bodies are doing their specific job, but we are product builders here. So OCP is the right platform for that.
If I may add, another very important environment that OCP provides to all of us, is a very reasonable legal framework for competitors to work together, for suppliers and consumers to work together. So working together and collaboration is key. But having a legal framework so that people feel comfortable sharing their inner thoughts is also very important. So basically, what we have done from the beginning of OCP, the concept of openness has been important. But it was not very easy for everybody to adopt that from the beginning. It had to come through demonstrating trust, demonstrating contributions. That's why when we are asking people is to come up with something, we want that thing to be useful. We want that thing to be collaborated already before contribution, so when it goes out, it gets adopted. So therefore, what we've done in the several years past, we're coming to stages like this. Every year, we dream big. We think about something that's very hard to do in the future. We challenge each other. And then we go and start collaborating. Work streams and specifications are going to be built throughout the year. And you go from one global summit to the local summit. And then you come and show it. And you execute on that. So dream big, collaborate, and execute.
That's great. That's a great takeaway. So I'd like to ask one more question, and then open it up to the audience and leave enough time. So I guess one question: We've spoken about mostly electrical interconnects and standards. My question is, what are OCP's plans, or is there anything on the horizon concretely to now incorporate optical interconnects and fabrics into the initiatives that are ongoing? If any of you all could comment on that?
One thing I would say is, I think there are other forums in the OCP, right? Beyond. We are more in the memory systems. So those people are focused on the optical technology. But what we are trying to do here is bring memory in a systematic way, right? Again, with bandwidth in mind. And this is where optics comes. So maybe we cannot, you know, we are not all optics experts here. But we can bring that perspective of interconnects and optics. But again, our root problem that we are trying to solve is how we bring connected memory. So yeah, we can work together. But if we just focus on one, right? OCP has a lot of swim lanes, I guess. There is a chip layer. There is a UCIE. But you're just focused on that, right? But here, we are focused on the memory systems. And these are all the tools to us, right? Interconnects, standards, optics, or copper. These are all tools to get where we want to go.
And one aspect of it. Even though we're not exactly what we can be doing, but I can say I can leave the two problems that we have, at least from the AI systems perspective. As I started saying that, you know, there is going to be a bandwidth reach problem in the kind of a scale-up environment that we look for. Because the amount of bandwidth that we need is continuing to grow. The cluster size that we see in the scale-up is also increasing. And with increasing bandwidth, that really is going to be, the reach for the copper is going to become more and more challenging, as we go from 100 gig per lane to 200 to 400. So at some point, we are going to see explorations and comparison of what are the best technologies for us to make sure these GPUs can continue to work very tightly in the scale-up systems. So that is one aspect of the problem. Second aspect of the problem is basically, you will see that actually the scale-up interconnect drives memory traffic between the GPUs for HBM to HBM data movement, practically. Now, we are going to start seeing, as Samir was saying, more memory use cases that I need to start expanding the memory. So if you look at the form factor of a rack, or whatever you want to call that as a rack or a pod, that space is limited. The amount of GPUs that you put, or accelerators you put there, amount, number of switches you put, all of this actually has the same square millimeter or square feet space. So how that is more optimally used from the compute and network or interconnect perspective is the challenge. And so this is where. Again, we are going to talk about how many wires are there, what kind of connectors are there, and how much distance I need to go, and how do I optimize it in my data center. So I'm not going to give you an answer, but I'm saying that these are the problems that we're working on, which are very right now kind of space for us to deployment perspective. So I think optical will play in that space.
Just a small addition. Again, this is the team that Composable Memory Systems was formed, because we first thought about disaggregating, Compute. And then techniques such as CXL and such enabled disaggregating, for good reasons. Not everybody can fit in the same place because of power and cooling and the distance we have for being fast. So we needed to disaggregate it. But then once we disaggregate things, we need to compose them back together so that it looks the same to the software and to the programmer, the same way. So that is the Composable Memory System work group. But as we describe problems, we need to think about, what are the problems that we're trying to solve? And we can imagine big, again, dream big. I was asked to present something to the Short Reach Optical Interface team yesterday. And the challenge to all of us is, how do you build a system, a super cluster of 1 million GPUs, reliable and quickly and effectively and cost effectively? So that is the challenge we have all in front of us. So when we look at that, of course, we would like to avoid new technologies. But to the extent that we can use the old technology, we build smaller systems. But as systems get bigger and bigger and bigger because the problems are bigger and bigger and bigger, we need different hierarchies. And if we have these different hierarchies, we need memory stages and we need better interconnect techniques. So once we understand all those as problems, and we are physicists or technologists or engineers, can think about at least properties of certain technologies as being appropriate. Photonics, optical interface, compared to copper has these properties. The signal that goes just light is immune to any kind of electrical noise. So that's a good thing. We can send more bit rates through that fiber. That's a good thing. It can go longer. That's a good thing. Another thing that's very important to us when we want to interconnect many, many, many things together is the fact that photonics and electrics, optical components can be small. If they are small, then we can go to many places. Using the same shoreline, we can go many places. Therefore, it reduces the number of switch layers that we have to have. Therefore, it reduces latency, reduces complexity, and cost of developing large systems. So photonics and optics do have these properties. Now what's left is for us to realize it and commercialize it.
That's great. So let me, I think, let me open it up. We have a couple of mics out here. Anyone have any questions from the audience? Please come up to the mic. I see there's one question out here. Go ahead.
Should we start co-packaging photonics engines on accelerators and CPUs, to kind of bring all that closer and be more, I guess, energy efficient than having pluggable or copper for the same constraints?
Yes, could we have it next year?
Say that again?
Yes, please. Offer it next year.
I'll get working on it.
Good for you.
Maybe just to add that. I mean, you are doing that today. We are seeing that on the switch side, on the inter-network side today, right? I mean, once you exit the top of the rack. The question is basically, I think your question is basically, does that get integrated in the compute side, right? And I think the key question comes from that. Are we at the right cost? Right? Are we at the right power? Are we at the right reliability? And of course, what kind of maturity will we get? But I think that is a path that we will have to explore at some point of time. Where is the transition? Where is the crossover? At what point does that become interesting and extremely appealing, or maybe really compelling to that level?
So, he means yes. Give it to me next year, please.
All right, why don't we have the next question? Thanks. Thanks for that question, Pankaj?
Nilesh, thanks for great questions to the panel and great answers so far. So I'm still troubled a little bit by the coexistence of CXL and UALink and exactly what shape it will take. And you made it abundantly clear why there needs to be more than one. But how will they come together in a real system in a meaningful way? So let me pose a specific question and maybe add Siamak. So it appears that CXL is very firmly established in devices, both memory modules and SSDs, and such. And then its future becomes less certain as you get closer to these accelerators, where there is some refusal to talk anything other than UALink or NVLink, et cetera. So the question really is that as you device vendors think about, should you keep building CXL into your device interfaces with what value? How do you see that will bridge into the top? And what are some characteristics that are still missing where OCP could help?
Again, using analogies, I tried to illustrate this. First, we have different tools for different use cases and requirements. Specifically about CXL, as you know, load store semantics, cache coherent semantics can give us 200 nanosecond delay response time to, say, a command for read, 200 nanoseconds. Now, at Hotchips, people publish data on what can you do on just a regular write to NVLink, 2.3 microseconds of just a write. And that's OK, because they can handle it, because you can move data ahead of use. You don't care. If you're willing to change software, you can do that. You could use a lot of bandwidth. You put people on a train, and you send it from one city to another city. If I want to go to my local store, I don't want to get on a train. So I want to get there quick. So these are just simple analogies to illustrate that if you don't want to change your software, if all you need is memory expansion around CPU, you can have multiple CPU links off of the good CPUs that are coming around, CXL links from them, and you can expand your memory. Your software doesn't have to change even a bit. And you just get it. Now, if you want to do more, if you want to extend it from one place to another place, you might need photonics. Photonics can come, and CXL protocol can run on photonics. But if that's not good enough, then OK. You increase the bit rate, and increasing the bit rate, one model for it is NVLink or UALink. And in that case, even then, you have to have multiple links to get the aggregate bandwidth. And then you will use that tool in that model. So software coherence can be just fine for RDMA or for UALink, NVLink. And hardware coherence, CXL is the right tool.
I see, Hoshik has a question. Maybe some of you have a comment.
Yes, I have one point. I think CXL memory has a place in the CPU-centric server that will continue to stay, because we don't have that much bandwidth problem there. AI space, now when the UALink comes, what is UALink? They're bringing a lot of goodness of PCI CXL there, and they're using Ethernet 5. So if there was no ecosystem naming problem, one could have named it CXL 4.0. But this is how one should look at it. But the new names are thrown, that doesn't mean that things are way too different.
All right, go ahead, Hoshi.
Yeah, thank you. I have some comment or suggestion, rather than a question. So we've been talking about this future—a rosy future and promises of CXL for a while, right? For the past couple, several years. But we have to realize and we have to admit that there are also challenges we've been facing, and then a lot of, you know, I'm a strong believer and strong supporter of CXL, but you know, there are still unbelievers. So rather than just talking among us about rosy pictures, we have to list up all the challenges and the practical issues we have to resolve in order to make this a really commercially successful, make a meaningful success, right? So, I mean, we have been talking about this future, I mean, promising future, so, you know, right? So we have to be realistic. I think we have to list all the challenges, unbelievers, and address them, and then we have to find the solution or justification, and then we have to communicate with CXL customers and the customers.
So, I was just going to say, Hoshik, is that an action that SK Hynix is signing up to start the list, and then we all can add at OCP?
Yes, okay. I already have a list of problems I'm trying to solve.
I would love to, love to work with you together, but go ahead.
I think this is a good point. I think CXL is a technology that is enabling certain solutions. Like what Samir said, we are here to talk about what the requirements are and what drives it, because that's the important thing for the product's development. So CXL memory expansion, of course, I think we are doing memory expansion from tier one to tier two. That means tier two has to provide a capacity, and so I think we should put down those, exactly what that means, but obviously, that means tier two needs to be providing those capabilities at the right TCO optimization, right? Network, or scale-up network side, the discussion that we had for CXL versus UALink or versus RDMA, I think that solves a slightly different problem. It is memory-to-memory movement, though. It is still between tier one to tier one, or maybe sometimes it will have tier one to tier two memory expansion happening. There, the problem is different bandwidth and latency, and then the scale is a really important aspect of it. So I think, so I want to separate out those two things, the memory expansion for general-purpose compute from TCO optimization versus the AI system, back in connectivity or the scale-up connectivity.
Yes, yes. We have to categorize.
Yes. And then we have to address that. Right. And this is a very important point, because people look at saying that, well, there was this discussion about, is CXL relevant for AI, or is CXL relevant at all? The point is, CXL is absolutely relevant for general-purpose computing. The discussion has been more on the scale-up side of the networking, which is a completely different problem statement.
But by the way, Manoj, so actually, to Hoshik's point, what would be the right forum? Would it be the CMS forum to have this discussion?
Absolutely. I think we have the right people. We have the right problem-solutions discussion forum here, which happens every time. So, absolutely, the right forum to have discussion.
Prakash.
Yeah, there's a question.
I think we really needed more chairs. We don't have to use Hoshik and Prakash here.
I was trying to actually make a comment about the question that Pankaj had about why did CXL not become the scale-up network? And I think we brought this in OCP last year, saying this is what needs to happen for CXL to be the scale-up network of the AI era. And what we've learned is that CXL, because of backward compatibility needs, and because it meets a much larger scope of use cases, it's not easy for it to adopt as fast as something that is brought up from the ground up. And UALink was kind of a response to that, because what we were trying to solve at the CXL consortium would have taken much longer than what the product cycles needed. So that's kind of an explanation of why CXL 3.x did not support the scale-up fabric use case.
Thank you very much, Prakash. Prakash is the director and board of directors on the CXL consortium.
I think we are out of time. Thank you for sharing your thoughts, and also to the audience for coming up and asking some challenging questions. Thanks.