You signed in with another tab or window. Reload to refresh your session.You signed out in another tab or window. Reload to refresh your session.You switched accounts on another tab or window. Reload to refresh your session.Dismiss alert
I implemented dlpack v1 for CuPy (see cupy/cupy#8683), and there are two choices that are important for other implementations and maybe the spec:
We chose to export the cudaManaged device when possible even if dl_device=(CPU, 0) was requested. I.e. we promise that the data can be used on the CPU device, but cupy currently will still give you the actual (compatible) device!
Note: NumPy is OK with this in the case of cuda managed memory. But it may not yet be OK with it in the case of future/other similar devices. (I.e. NumPy may need to trust the producer in this case, or we just keep it a bit of a fuzzy thing where we assume the consumer should know the device, possible based on version.)
If user passes dl_device=(CPU, 0), stream=.... We had discussed that the semantics must be related to the device that the data is on, I think. CuPy supports this:
stream=None (or nothing passed), will synchronize the device to host copy (i.e. wait until the data is CPU available).
stream=consumer_stream will not synchronize. The user could in theory work with the data (e.g. another cudaAsyncCopy) on consumer_stream, or synchronize themselves (e.g. if multiple copies needed).
REASON: One reason is that synchronizing in the second case would achieve nothing that stream=None doesn't already achieve. It would effectively do the same stream=None and also synchronize the consumer_stream. (But that stream does not need to be synchronized!)
I implemented dlpack v1 for CuPy (see cupy/cupy#8683), and there are two choices that are important for other implementations and maybe the spec:
cudaManaged
device when possible even ifdl_device=(CPU, 0)
was requested. I.e. we promise that the data can be used on theCPU
device, but cupy currently will still give you the actual (compatible) device!dl_device=(CPU, 0), stream=...
. We had discussed that the semantics must be related to the device that the data is on, I think. CuPy supports this:stream=None
(or nothing passed), will synchronize the device to host copy (i.e. wait until the data is CPU available).stream=consumer_stream
will not synchronize. The user could in theory work with the data (e.g. anothercudaAsyncCopy
) onconsumer_stream
, or synchronize themselves (e.g. if multiple copies needed).stream=None
doesn't already achieve. It would effectively do the samestream=None
and also synchronize theconsumer_stream
. (But that stream does not need to be synchronized!)CC @leofang.
The text was updated successfully, but these errors were encountered: