284

YouTube:https://www.youtube.com/watch?v=u5OCpXKx9xo
Text:
Hey everybody, my name is Grant Mackey. I'm the CTO at Jackrabbit Labs, and today I'm here to talk to you a little bit about CXL Fabric orchestration and management and our command line interface tool to the codebase we've open sourced for this that we call Jack.Hence my talk, you don't know Jack.

Jackrabbit Labs, I think we're the first open source company for software services around shared memory fabrics. A little while ago, we noticed that there was this gap in the ecosystem specifically for CXL Fabrics, but for memory fabrics in general as they're starting to come out where folks are really working on the hardware side and the driver side, and getting the OS connected to these devices, but there's this really large gap in the middleware layers that connect end users to these very cool technologies and so that's our focus is to get ourselves get the industry pushed forward so that when these things are mature stable and ready for consumption people can just plug them in, right? 

So why open source? Uh, historically, open source has been kind of the best way to reduce fragmentation, right? Because there's nothing particularly proprietary, there's nothing that's like tribal secrets; it's just these are the things that we should march towards, and um, it keeps, it keeps folks from going off in their own direction because that fragmentation really slows down industry adoption, and maturity, and stability of code bases. So you know, we have the CXL spec is at 3.1 now, there's been a lot of good work in that area. Um, for compute, you know, we've got Intel is about to release their 2.0 stuff; it should be GA now-ish, same with AMD switches. We've got Xconn, I think we've got other people coming up soon, right? And that'll, that'll be great. We have a bunch of different memory controllers, we have a bunch of different memory modules, and then we have a number of appliances that are all trying to plumb from base all the way to the top of the stack. And what that's generating is a lot of different types of RESTful interfaces, and libraries, and CLIs, and GUIs, and APIs, and documentations, bespoke extensions, I2C, I3C implementations, Redfish proposals, etc., and these are all good. Uh, you know, they're moving things forward, but they're all, they're all sort of doing things similarly, and um, to really know about all of them is like a new one seems to pop up every other day. Uh, to know all of them is sort of daunting for a lot of people, and I think that's a really good point. I think that's a really good point for the person who's going to be at the end of it, looking at a standardized software stack. So that's, that's our core focus: is trying to get things to a point where, you know, somebody running a Kubernetes cluster, um, or some other large resource orchestrator, understands what they need to do to get from A to B.

This is my CXL FM API slide. Um, there's not a lot to talk about here. It's the fabric management API for the CXL spec. It's just that, an interface that completes a set of actions on the fabric and nothing else. You're just pushing byte streams around, and byte streams do actions, but there's no state management, there's no history, and that's not orchestration. Um, that's just actions. So, things that would consume this later, you know, they can't, they don't want to know how to speak FM API, not only because it's specific to CXL and they have to manage yet another set of state beyond what they're already managing, but that command set grows quickly. So, CXL 2.0, there were four sets of commands, good sets, not four commands, but sets of commands. And in 3.1, which is out now, there's nine, and it's really nine and a half because when they added, um, port-based routing to the physical switch command set, it added more commands, right? So, it's like a large jump from two to three dot one. And to keep track of all that and update all of that in a framework that historically has not had to do these sorts of things is a big ask for end users. So, we want to provide that, something in between everybody else and the FM API. So that's where we're headed. That being said, this is Jack.

Uh, he's Jack. Is a CXL fabric management interface; it's a wrapper around a set of libraries that we put together in open source. The link is down there on the bottom. A few months ago, between Jack and its counterpart, the CSE—the CXL switch emulator—you can test out all of the FMAPI 2.0 commands in a standard, repeatable way, and just kind of play around with it and see how you can do different types of CXL orchestration. So, you know, we're doing like showing the port states and what the ports do, what types of devices they are, the speeds that they support, lane widths, stuff like that. You can create and tear down VCS's, you can create regions, you can connect and disconnect things from hosts, sort of. And it's all compliant. But this is really just a first step, right? This is just existence proof of something that manages the FM API, but it doesn't do orchestration. It just does management.

You need a platform, right? So, right now in the CXL spec—and I think that this was appropriate, especially when the CXL spec was first started—is that when you give a CXL device to a host, the host owns it until it's done with it. So, and there was a reason for this, right? The CXL consortium, I think, appropriately punted because when they were putting this together, operating systems really weren't used to just ripping out a DRAM DIMM or adding a DRAM DIMM to a system or unplugging a PCIe add-in card or plugging in a PCIe add-in card while the system was on, right? And that's when the spec started. So what they said was, okay, well, the FM API can manage devices that are not yet unassigned and, or rather not assigned yet, and they can add devices to hosts. But once the host owns the device, that's it. Like the fabric can check that it still exists, but that's about all the FM API can do. So, if I have two hosts, and I have host A, and host A owns the CXL device. If host B needs that, it can't take it, right? Host A owns it forever. So, the platform aspect of this is that perhaps I have extracurriculars to the CXL spec that enhance the flexibility and the composability of a CXL memory fabric, but don't sort of like overlap into the spec domain. So the spec can do its standards the way it needs to do it, but the industry can move forward on the things that make the spec useful, right? Like more useful. And so, this is something where we envision the host running fabric daemons, and the CXL switch, or something on the fabric, is running a fabric orchestrator, and that orchestrator, it's maintaining state and stuff like that. But it's also interacting with these fabric daemons. So that resources can be reclaimed correctly and reassigned elsewhere, so that your CXL fabric isn't a set-it-and-forget-it sort of situation, right? So, in this situation, we have... You know, user application, and host B has asked the Fabric Orchestrator for resources. And the Fabric Orchestrator sees that host A has resources that host B could use. And so, the Fabric Orchestrator goes to host A's Fabric Daemon and says, "Hey, are you using that thing? I need it." And the Fabric Daemon speaks with the host OS because there are mechanisms to properly unplug CXL devices and memory regions and stuff like that from the host. And so, the Fabric Daemon goes to host A and says, "You know, let these things go. Migrate the resources you need to migrate. Let's take this device offline and unplug it." And those things occur. The Fabric Daemon acts back the release to the Fabric Orchestrator. And then the Fabric Orchestrator speaks with the Daemon and says... "Because here's this thing. You have it now. It's been assigned. I did all the VCS setup and region creation and stuff like that for you. Tell the user application that it can use it now." And that's it. Right? There's no way to do this in the CXL spec. And people are speaking about composable memory fabrics. But, like, there's no sort of middleware today that exists that does these sorts of things. So this is where our efforts are kind of focused right now.

Why? The why of that is that these large cluster schedulers, they don't actually want to know how to do this. Right? These resources, they just want to express intent to other things and then let those other things do it for them. Because they're already focused on doing one set of things well. Right? And this is, like, these are my examples. Right? The Kubernetes and the OpenShift folks, they've got the CRI, the CSI, the CNI. And then just, like, this other device plug-in support thing for stuff that doesn't fit into these other resource containers. Right? The Apache folks, they just completely punt. Right? As long as it speaks Genny and it speaks TCP/IP, the underlying hardware just doesn't matter at all. Proxmox, it uses Ceph for large distributed storage. So it doesn't have to think about large volumes of storage. It uses corosync for state synchronization and VMs. And then OpenStack, like, I had OpenStack just keeps on adding new projects. Right? Like, they have 41 types of resource services in their entire ecosystem. And they all have varying levels of hardware puntage, you know, underlying system resource puntage so that they can focus on the thing that they do well and then offload that work elsewhere. So this is a thing. That the CXL community and just memory fabrics in general need to be aware of is that trying to shove hard primitives into these types of resources is probably not going to fly well with these communities.

So, this is what we're looking to do at OCP, right? This is our OCP demo. We have Kubernetes, and we have an emulated CXL switch for now. And we have CXL devices at the bottom. And there's a memory fabric API. And this memory fabric API is in progress, meaning that the primitive mechanics are straightforward. Like, what do you have? I need stuff. And I'm done with this stuff. But the actual implementation characteristics are something that the industry needs to sort of drive. And we're just taking a first step as a demonstration at OCP. But you have a kube scheduler. And, you know, it has this work that says, okay, I have a pod on kubelet X that needs this much more memory, right? It expresses that intent to this memory fabric API. And that API converts it into CXL resource requests that the fabric daemon understands. The fabric daemon, like in the previous example, goes and communicates with the fabric orchestrator and says, these are the things that this host needs. So the fabric orchestrator goes in. It finds, like, first off. It decides whether or not that's even feasible, right? And then, second off, it goes and it makes those changes underneath underlying hardware to either create VCSs, collapse VCSs, create memory regions, interleave devices, blah, blah, blah, blah, blah. And then, when it brings that stuff online, communicates back to the fabric daemon on the kubelet that needed those resources and says, here's these things. You have them. Fabric orchestrator goes back to the fabric daemon. Running on the kube scheduler node and says, those resources are complete. Everything should be good to go. And then, finally, the kube scheduler can do what it's good at. And that's informing the pod that was about to oom. It was about to get its out-of-memory exception. If you didn't notice that you have more memory, now you have more memory. So, please don't, you know, don't off yourself. Continue to provide that service. This is very useful. And, you know, it's... It's a good amount of work, and we're getting there. And we're working with different partners to make sure that we're doing it in a way that we're not developing in a bubble. And as we put these components together and test them, we're putting them up on GitHub and our own internal Git servers for use.

And we're hoping to kind of drive that adoption to cut down on fragmentation so that the people who are interested in their commiserate parts can focus on those parts, rather than trying to plumb the entire ecosystem a particular way and just confuse people that would ultimately consume this technology. So those are our challenges. And our calls to action are, you know, if you don't have CXL hardware, you can do all of this in QEMU emulation. And there's various different ways to do that. There's a bunch of different guides online that you can find, and we know how to do. This QEMU emulation allows you to play around with software application development. And so you don't have to wait till hardware. You can start developing on it now and evaluate what's there to see what's deficient, what's buggy, what's confusing, so that those things can be addressed now, while the hardware ecosystem is getting to that point where, you know, 2.1 CXL hardware is generally available and 3.0, 3.1 in the next couple of years, is going to be available. So that there's not like another five years after the hardware sitting around that everyone's like, 'Oh, okay, this is useful now, right?' This is a small list of all of those projects I was talking about. So you've got Intel stuff, you've got our stuff, Samsung stuff, Micron stuff, Hynex stuff, more Micron stuff, more Intel stuff, all of the QEMU work that's being done adjacent to the Linux kernel. And then Samsung recently put out LibCXL management interface library. And they all sort of like circle on similar things, and we could stand to collapse some of these efforts into something more focused. So that's me. Please download our stuff. Tell us it doesn't make sense. Thank you.