139


Yes.  Good afternoon, everyone.  My name is Hoshi Kim, head of system architecture at SK Hynix.  I hope you enjoyed lunch, and I hope I will not make you fall asleep after lunch.  Okay.  AI, big data, cloud computing, all these modern computing trends have something in common.  They all require increasingly higher memory bandwidth and capacity for memory, to manipulate ever-growing amount of data.  Now, more than ever, memory takes the central stage of the computing systems, which is opening the door to memory-centric computing.  Memory is a very expensive resource, averaging around 30% of the server value in 2022, and forecasted to reach over 40% by 2025.  To address such issues, CXL interface protocol has been proposed, with the goal of optimizing memory resources and accelerating the execution of data center workloads.  Today, I'll briefly introduce our recent research projects and pathfinding activities leveraging CXL technology, which I believe is an opening gateway or prelude to memory-centric computing era. 

Before we start, I'd like to talk about the current server CPU trends.  As you already know, the number of CPU core counts is continuously growing by extending the life of Moore's law with the advent of chiplet technology. 

More CPU core counts will require more bandwidth and capacity from memory.  However, DRAM technology scaling is nearing the limitation, and the number of DRAM channels that SoC can integrate is also being limited by its die size.  So these graphs show such gaps between bandwidth and capacity requirements and DRAM provisioning capabilities, which are continuously growing.  That is our challenge.

So here comes the compute express link, which is the CXL.  CXL is an industry open standard for high-speed cache-current interconnectable processors, memory expansion, and accelerators.  CXL facilitates cache-current memory access between CPUs and CXL-attached devices, which can be pure memory devices or accelerators.  It also allows resource sharing for higher performance and reduced software stack complexity in more cost-effective way.  As you know, a memory system plays a significant role in determining the performance, reliability, and flexibility of computers.  The goal of CXL is to enhance these attributes by allowing easier memory expansion, disaggregation, and sharing.

We believe CXL will create new opportunities in system architecture beyond what is possible today in server platforms.  First, it will allow memory bandwidth and capacity expansion beyond what the internet can offer.  Second, since it basically moves memory media control from an SoC to the CXL-attached device, it will allow us to differentiate memory media behind the CXL interface, which can give us room for full control of our own memory media or reliability of the device.  Again, since a CXL device will have a memory controller inside, we can add more value-adding features into our own memory controller or an SoC to give our customers added benefits.  Last but not least, it finally gives an opportunity for long-awaited disaggregation of memory.

So with these new opportunities, we are envisioning a memory-centric computer system architecture with various novel memory solution products, such as pooled memory appliance, computational memory, and volatile/nonvolatile hybrid memory solution, and so on.  In the next slides, I will introduce some of our current research and pathfinding activities with various CXL memory solution prototypes.

The first research activity I'd like to show you today is our memory expansion solution with HMSDK, which stands for Heterogeneous Memory Software Development Kit.  HMSDK will facilitate the efficient use of CXL memory for bandwidth and capacity expansion.  It provides software library for ease of use for software developers to use CXL memory implicitly, meaning without modifying software, and also provides software APIs for explicit use of CXL memory.  Specifically, we have implemented CXL-aware memory allocation and placement technique, which we call bandwidth-aware interleaving technique in HMSDK 1.1.  We publicly announced that at last year's FMS event.  This is a software-based heterogeneous interleaving technique between local DIMM channels and multiple CXL memory cards.  This open-source software is now publicly available in our GitHub page.  Many CPU vendors and server OEMs are already working with us to test and evaluate this software.

While HMSDK 1.1 has implemented CXL-aware memory allocation and placement policy, HMSDK 2.0 we are currently developing is focusing on two-tier memory management technique for CXL memory.  In two-tier memory management technique, based on the profiling result of memory pages, hot pages can be promoted to faster DRAM and slower memory, while cold pages can be promoted to slow CXL memory.  There are two different sources of overheads in two-tier memory management, one in page profiling and the other in page migration.  So we have implemented an efficient way to manage page migration based on hot/cold temperature of memory pages.  Specifically, we have developed a technique to reduce profiling overheads in detecting hot/cold temperature of pages by leveraging Linux DAMON profiler instead of page-fault-based profiling method that is being used in existing techniques such as like Autonuma Turing.  We also have developed a technique to reduce actual page migration overheads between fast DRAM and slow CXL memory by minimizing CPU stalling, which is caused by TLB shoot-down procedures.  We have implemented this novel, deferred, and batched TLB shoot-down technique in our Linux kernel patch and have recently uploaded to our Linux kernel developers community.  And we are getting lots of positive feedback and drawing huge attention.  So we are expecting this new HMSDK 2.0 will be available for download later this year.

The next research source I'd like to share is about our memory disaggregation, or pooled memory system solution.  CXL offers a new opportunity for rack-scale, disaggregated, on-demand memory.  It will fundamentally change the way people do programming in computers going forward, since you've got much more memory available to you dynamically on demand.  With memory disaggregation, I think we can categorize its use case and potential benefits into two major groups, one in memory pooling and the other in memory sharing.  As I mentioned earlier, DRAM can take more than 40% of the overall server cost in data centers.  However, up to 25% of DRAM capacity is known to be stranded.  So with memory pooling, we are expecting to minimize this stranded memory and thus improve memory utilization and save DRAM costs.  With memory sharing, we can dismiss TCP or RDMA data transfer over network and as software overheads, which can be seen in conventional, disaggregated computing systems, and thus improve performance.  We can also lift memory pressure by avoiding redundant copies of memory objects in disaggregated computing systems.  With CXL technology, we believe we can finally turn long-awaited dream of memory disaggregation into a reality.

Now, let me introduce Niagara, our world's first one and only real application running CXL pooled memory system prototype, to the best of my knowledge.  It supports dynamic capacity service, which means Niagara can dynamically allocate or release memory resource to each host on demand.  We call it Elastic CXL Memory Solution, and it was publicly announced and showcased at last year's FMS event, too.  We are actively engaging and collaborating with software partners to enable these use cases and facilitate the ecosystem for this pooled memory system.  We have developed also a cluster management software for data centers so that real applications can run on a virtual machine with a Niagara prototype.  We are also working on an open source distributed computing framework called RAID.io to leverage disaggregated shared memory resource instead of TCP or RDMA data transfer.  Niagara can currently support up to four hosts of a connection to share CXL pooled memory resource with up to one terabyte of memory.  It can support both memory pooling and sharing modes.  We are also expecting to announce the next version of Niagara 2.0, which supports eight hosts of a connection very soon, by the end of this year.

The next item is our computational memory solution, which we call CXL-CMS.  CXL-CMS is our world's first near-memory processing-enabled CXL memory solution prototype, which is actually running a real-time, real-life data analytics application.  CXL-CMS implemented CXL memory expansion with near-memory processing.  It reduces data movement between CPU and CXL-attached device and frees up CPUs to do other useful work.  It is better at memory-intensive operations.  It will show scalable performance with multiple CMS curves.  CMSDK, which stands for Computational Memory Software Development Kit, provides data analytics APIs, which is handling data in Apache Arrow data format.  It supports dynamic memory allocation to CXL memory, as in standard CXL capacity expansion memory.  It supports multi-processes and multi-CMS cards for scalable performance. 

In last year's OCP Global Summit, we have showcased our first version of CXL computational memory solution, running a real-time data analytics system called LightingDB.  LightingDB is SK Telecom's proprietary real-time in-memory database deployed in commercial data analytics service.  It supports push-down computation for machine learning functions, filters, and aggregation.  We have been working closely with SK Telecom, the largest telecom company in Korea, to integrate our CXL CMS solution with LightingDB in their real-life service application.  At this year's OCP Global Summit, we are showcasing our second version of CXL computational memory solution, which will show off more advanced features and flexibility with a real-life commercial data analytics service.  CMS 2.0 now supports virtual address translation, so the host CPU can utilize CMS internal memory as a user system memory, as well as a function uploader with near-memory processing.  CMS 2.0 also includes programmable cores, as well as hardwired acceleration logic to enhance the flexibility of uploading functions.  With this CXL computational memory, we can improve performance by reducing disk spills with memory expansion, and also accelerate memory-intensive operations with near-memory processing.  CMS can offer higher throughput per watt compared to conventional server-scale up solution.

Lastly, I want to briefly share our current pathfinding efforts in CXL-SSD, or Hybrid DRAM NAND Memory Solution.  I think we can consider CXL-SSD as a memory device or storage device.  We are currently focusing on exploring a number of different use cases and solution architectures for this kind of device.  As a memory device, performance of this type of device will typically depend on the memory access pattern of workloads.  It will also provide low-latency memory access with higher DRAM cache hit ratio, while it will show long latency in case of DRAM cache miss.  We can also have two-tier memory internally between hot and warm data temperature of pages.  We can also provide memory persistency with backup NAND device in case of sudden power failures.  So as a storage device, it can be functioning as a conventional block SSD using standard CXL.IO interface protocol.  It can also be configured as byte addressable for reads and block addressable for writes.  We can certainly add some accelerator or processing unit to this type of devices to make it a computational SSD.  With all these different types of use cases and solution architectures, we are currently considering various design trade-offs in terms of performance, complexity, and cost to realize this new type of devices.  Hopefully, we'll have some working prototype early next year.  We are currently focusing on the user use case and customer value out of this.

So CXL is an exciting technology that has the potential to revolutionize the way we design, build, and use memory systems in computer systems.  We believe and expect CXL will create new opportunities in memory system architectures with emerging use cases, such as memory expansion with memory tiering, persistence memory, memory pooling, near-memory processing, and eventually opening a door toward memory as a service and a truly memory-centric computing era.  Industry ecosystem must work together to figure out many system architecture and software issues to bring all this potential into reality.  Data center industry players are also working very closely together to introduce their first CXL-enabled system in their data center production very soon.  With SK hynix commitment to driving innovation in the way data is stored and processed by fostering collaborative efforts with industry ecosystem partners here at OCP, we will bring new values and opportunities to our customers with our world-leading CXL memory solution technology.  Thank you very much for attending my session today.

So please visit our booth at the Expo Hall, where you can learn and experience our latest technology and our various memory solution products.  Thank you.