You signed in with another tab or window. Reload to refresh your session.You signed out in another tab or window. Reload to refresh your session.You switched accounts on another tab or window. Reload to refresh your session.Dismiss alert
I have a question regarding the handling of GPU device memory in TensorRT 10.3. Here is the situation I'm facing:
Context:
TensorRT 8: When using TensorRT 8, we could execute inference entirely on the GPU using the execute_async_v2 function. By passing the pointers of input and output tensors, all operations, including the inference results, were handled directly on the GPU.
TensorRT 10: In TensorRT 10, with the introduction of execute_async_v3, we can no longer allocate bindings in the same way. Now, we need to set up host/device memory for inputs and outputs and use cuda.memcpy to transfer the results between them.
Current Process: For input tensors, we can directly pass a pointer to a specific input tensor to avoid additional copy operations, allowing inference to proceed without copying data back and forth between host and device memory. However, for the output, we currently perform a device-to-host memory copy to retrieve the inference results.
`
allocate input / output mem codes
for i in range(self.engine.num_io_tensors):
tensor_name = self.engine.get_tensor_name(i)
engine_input = self.engine.get_tensor_shape(tensor_name)
size = trt.volume(engine_input)
dtype = trt.nptype(self.engine.get_tensor_dtype(tensor_name))
host_mem = cuda.pagelocked_empty(size, dtype=dtype)
device_mem = cuda.mem_alloc(host_mem.nbytes)
self.bindings.append(int(device_mem))
if self.engine.get_tensor_mode(tensor_name) == trt.TensorIOMode.INPUT:
self.inputs.append({'name': tensor_name, 'host': host_mem, 'device': device_mem})
else:
self.outputs.append({'name': tensor_name, 'host': host_mem, 'device': device_mem, 'shape': engine_input})
# set device mem to tensor pointer
self.inputs[0]["device"] = int(some_tensor.data_ptr())
`
Question:
Is there a way in TensorRT 10.3 to obtain the inference results directly using the device memory pointer, without the need for an additional cuda.memcpy operation to transfer the data back to host memory? Essentially, I'm looking for a method to access the results directly from the GPU device memory.
Thank you for your assistance!
Environment
torch: 2.0.1
numpy: 1.26.4
python: 3.10.14
TensorRT Version:
tensorrt: 10.3.0
tensorrt-cu12: 10.3.0
tensorrt-cu12-bindings: 10.3.0
tensorrt-cu12-libs: 10.3.0
NVIDIA GPU:
Tesla V100-PCIE-32GB
NVIDIA Driver Version:
535.183.01
CUDA Version:
12.2
Operating System:
ubuntu
The text was updated successfully, but these errors were encountered:
Description
Hello,
I have a question regarding the handling of GPU device memory in TensorRT 10.3. Here is the situation I'm facing:
Context:
TensorRT 8: When using TensorRT 8, we could execute inference entirely on the GPU using the execute_async_v2 function. By passing the pointers of input and output tensors, all operations, including the inference results, were handled directly on the GPU.
TensorRT 10: In TensorRT 10, with the introduction of execute_async_v3, we can no longer allocate bindings in the same way. Now, we need to set up host/device memory for inputs and outputs and use cuda.memcpy to transfer the results between them.
Current Process: For input tensors, we can directly pass a pointer to a specific input tensor to avoid additional copy operations, allowing inference to proceed without copying data back and forth between host and device memory. However, for the output, we currently perform a device-to-host memory copy to retrieve the inference results.
`
allocate input / output mem codes
`
Question:
Is there a way in TensorRT 10.3 to obtain the inference results directly using the device memory pointer, without the need for an additional cuda.memcpy operation to transfer the data back to host memory? Essentially, I'm looking for a method to access the results directly from the GPU device memory.
Thank you for your assistance!
Environment
TensorRT Version:
NVIDIA GPU:
NVIDIA Driver Version:
CUDA Version:
Operating System:
The text was updated successfully, but these errors were encountered: