-
Notifications
You must be signed in to change notification settings - Fork 233
Description
Motivation
In CCCL < 3.2, we can construct resource_ref types from device_memory_resource*. However, the ability to construct a resource_ref from a device_memory_resource* has been removed from CCCL 3.2 as a part of the new memory resource design (#2011). This was needed for various reasons, such as the requirement for resource types to be copyable, which enables us to reify a resource_ref into an any_resource -- this is the core motivation for the new memory resource design in CCCL, and it solves a large number of memory resource lifetime management and language-boundary problems that RMM has faced since its inception. Because device_memory_resource is a virtual base class, we can't use it directly as device_memory_resource and must instead use a pointer device_memory_resource* as it points to a concrete implementation class. This puts us into a catch-22 where we can't use CCCL 3.2 with existing RMM APIs, but we can't migrate the rest of RAPIDS until RMM has a solution for both CCCL < 3.2 and >= 3.2, and we must find a way to bridge this gap.
Goal
We need a way to bridge from our current state where device_memory_resource* is still sometimes used in the API of RMM and consuming libraries, to a future state where everything uses resource_ref types in APIs and uses any_resource for (maybe-shared) ownership.
To do this, we need a new (internal) resource type in RMM that holds a device_memory_resource* but is copyable, thus meeting the requirements of CCCL resources. It does not attempt to solve the lifetime problems that come with using raw pointers, which is the status quo in RMM (it is documented that memory resources must outlive their allocations). I propose calling this type device_memory_resource_view, and putting it in a detail:: namespace unless we find some compelling reason to do otherwise. This type is only meant to be a stopgap until we can fully migrate to a resource_ref / any_resource design across all of RAPIDS. At that point, we will remove the virtual base class device_memory_resource along with all APIs accepting/using device_memory_resource* in favor of a design that is purely based on CCCL MR concepts. It is hard to migrate to that state directly, without an interim solution, so this fills the gap. The final state will not have any unsafe lifetime issues because of the (maybe-shared) ownership design of any_resource, and the intermediate state is no less safe than RMM's existing pointer-based design.
Tasks
- Create an internal/detail CCCL MR implementation called
device_memory_resource_viewthat holds adevice_memory_resource*and does not manage MR lifetime but is copyable (unlikedevice_memory_resourcebecause it is a virtual class) -- Compatibility updates for CCCL 3.2 #2162 - In
cccl_adaptors.hpp, add a constructor forcccl_async_resource_refandcccl_resource_reffromdevice_memory_resource*that calls the base class constructor with adevice_memory_resource_viewconstructed from the raw pointer. -- Compatibility updates for CCCL 3.2 #2162 - Migrate all APIs accepting
device_memory_resource*to perform this conversion to a CCCL MR ref-type and update the per-device resource maps to only use ref types.
Later:
- Deprecate and remove the
device_memory_resourcebase class. This will require heavy refactoring of the Python interfaces and is better to track as part of the broader work in [FEA] Support memory resources from CCCL 3.2 #2011. - Once
device_memory_resourceis no longer needed, get rid ofdevice_memory_resource_viewand replacecccl_adaptors.hppwith CCCL's upstream ref types.
Thanks @pciolkosz for discussing this design with me.
Metadata
Metadata
Assignees
Labels
Type
Projects
Status