Skip to content

[FEA] Implement a bridge to help migrate away from device_memory_resource* #2143

@bdice

Description

@bdice

Motivation

In CCCL < 3.2, we can construct resource_ref types from device_memory_resource*. However, the ability to construct a resource_ref from a device_memory_resource* has been removed from CCCL 3.2 as a part of the new memory resource design (#2011). This was needed for various reasons, such as the requirement for resource types to be copyable, which enables us to reify a resource_ref into an any_resource -- this is the core motivation for the new memory resource design in CCCL, and it solves a large number of memory resource lifetime management and language-boundary problems that RMM has faced since its inception. Because device_memory_resource is a virtual base class, we can't use it directly as device_memory_resource and must instead use a pointer device_memory_resource* as it points to a concrete implementation class. This puts us into a catch-22 where we can't use CCCL 3.2 with existing RMM APIs, but we can't migrate the rest of RAPIDS until RMM has a solution for both CCCL < 3.2 and >= 3.2, and we must find a way to bridge this gap.

Goal

We need a way to bridge from our current state where device_memory_resource* is still sometimes used in the API of RMM and consuming libraries, to a future state where everything uses resource_ref types in APIs and uses any_resource for (maybe-shared) ownership.

To do this, we need a new (internal) resource type in RMM that holds a device_memory_resource* but is copyable, thus meeting the requirements of CCCL resources. It does not attempt to solve the lifetime problems that come with using raw pointers, which is the status quo in RMM (it is documented that memory resources must outlive their allocations). I propose calling this type device_memory_resource_view, and putting it in a detail:: namespace unless we find some compelling reason to do otherwise. This type is only meant to be a stopgap until we can fully migrate to a resource_ref / any_resource design across all of RAPIDS. At that point, we will remove the virtual base class device_memory_resource along with all APIs accepting/using device_memory_resource* in favor of a design that is purely based on CCCL MR concepts. It is hard to migrate to that state directly, without an interim solution, so this fills the gap. The final state will not have any unsafe lifetime issues because of the (maybe-shared) ownership design of any_resource, and the intermediate state is no less safe than RMM's existing pointer-based design.

Tasks

  1. Create an internal/detail CCCL MR implementation called device_memory_resource_view that holds a device_memory_resource* and does not manage MR lifetime but is copyable (unlike device_memory_resource because it is a virtual class) -- Compatibility updates for CCCL 3.2 #2162
  2. In cccl_adaptors.hpp, add a constructor for cccl_async_resource_ref and cccl_resource_ref from device_memory_resource* that calls the base class constructor with a device_memory_resource_view constructed from the raw pointer. -- Compatibility updates for CCCL 3.2 #2162
  3. Migrate all APIs accepting device_memory_resource* to perform this conversion to a CCCL MR ref-type and update the per-device resource maps to only use ref types.

Later:

  • Deprecate and remove the device_memory_resource base class. This will require heavy refactoring of the Python interfaces and is better to track as part of the broader work in [FEA] Support memory resources from CCCL 3.2 #2011.
  • Once device_memory_resource is no longer needed, get rid of device_memory_resource_view and replace cccl_adaptors.hpp with CCCL's upstream ref types.

Thanks @pciolkosz for discussing this design with me.

Metadata

Metadata

Assignees

Labels

Type

No type

Projects

Status

To-do

Milestone

No milestone

Relationships

None yet

Development

No branches or pull requests

Issue actions