How to Obtain GPU Device Memory Values Directly in TensorRT 10.3? #4096

ryukh4520 · 2024-08-26T08:19:32Z

Description

Hello,

I have a question regarding the handling of GPU device memory in TensorRT 10.3. Here is the situation I'm facing:

Context:

TensorRT 8: When using TensorRT 8, we could execute inference entirely on the GPU using the execute_async_v2 function. By passing the pointers of input and output tensors, all operations, including the inference results, were handled directly on the GPU.
TensorRT 10: In TensorRT 10, with the introduction of execute_async_v3, we can no longer allocate bindings in the same way. Now, we need to set up host/device memory for inputs and outputs and use cuda.memcpy to transfer the results between them.
Current Process: For input tensors, we can directly pass a pointer to a specific input tensor to avoid additional copy operations, allowing inference to proceed without copying data back and forth between host and device memory. However, for the output, we currently perform a device-to-host memory copy to retrieve the inference results.

`

allocate input / output mem codes

    for i in range(self.engine.num_io_tensors):
        tensor_name = self.engine.get_tensor_name(i)
        engine_input = self.engine.get_tensor_shape(tensor_name)
        
        size = trt.volume(engine_input)
        dtype = trt.nptype(self.engine.get_tensor_dtype(tensor_name))

        host_mem = cuda.pagelocked_empty(size, dtype=dtype)
        device_mem = cuda.mem_alloc(host_mem.nbytes)

        self.bindings.append(int(device_mem))

        if self.engine.get_tensor_mode(tensor_name) == trt.TensorIOMode.INPUT:
            self.inputs.append({'name': tensor_name, 'host': host_mem, 'device': device_mem})
        
        else:
            self.outputs.append({'name': tensor_name, 'host': host_mem, 'device': device_mem, 'shape': engine_input})

# set device mem to tensor pointer
 self.inputs[0]["device"] = int(some_tensor.data_ptr())

`

Question:
Is there a way in TensorRT 10.3 to obtain the inference results directly using the device memory pointer, without the need for an additional cuda.memcpy operation to transfer the data back to host memory? Essentially, I'm looking for a method to access the results directly from the GPU device memory.

Thank you for your assistance!

Environment

torch: 2.0.1
numpy: 1.26.4
python: 3.10.14

TensorRT Version:

tensorrt: 10.3.0
tensorrt-cu12: 10.3.0
tensorrt-cu12-bindings: 10.3.0
tensorrt-cu12-libs: 10.3.0

NVIDIA GPU:

Tesla V100-PCIE-32GB

NVIDIA Driver Version:

535.183.01

CUDA Version:

12.2

Operating System:

ubuntu

The text was updated successfully, but these errors were encountered:

ryukh4520 · 2024-08-27T23:53:20Z

it works same as inputs.
just assign data_ptr address to outputs[0]["device"]

`

make empty tensor for output
-output_shape = [3, 224,224]
output_tensor = torch.empty([3 * 224 * 224], dtype=somedtype, device ="cuda")

outputs[0]["device"] = int(output_tensor.data_ptr())

after inference, result value is assign to output_tensor
`

jinhonglu · 2025-01-21T13:03:10Z

#4330

I face a problem that when I assign the pointer with cuda memory, I got a different result compared to copying input data from numpy array.

Did you face a similar problem?

ryukh4520 closed this as completed Aug 27, 2024

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

How to Obtain GPU Device Memory Values Directly in TensorRT 10.3? #4096

How to Obtain GPU Device Memory Values Directly in TensorRT 10.3? #4096

ryukh4520 commented Aug 26, 2024

ryukh4520 commented Aug 27, 2024

jinhonglu commented Jan 21, 2025

How to Obtain GPU Device Memory Values Directly in TensorRT 10.3? #4096

How to Obtain GPU Device Memory Values Directly in TensorRT 10.3? #4096

Comments

ryukh4520 commented Aug 26, 2024

Description

allocate input / output mem codes

Environment

ryukh4520 commented Aug 27, 2024

jinhonglu commented Jan 21, 2025