[Question] What does device / embedding_specs.compute_device parameter in ctor of TBE mean? #2395

JacoCheung · 2024-03-06T02:36:38Z

Hi team, I am confused by the following parameters related to device context ( in TBE ctor. )

What combinations of those parameters are legal? Could anyone provide some hints?

Thanks!

q10 · 2024-03-06T19:24:07Z

ComputeDevice specifies the TBE kernel variant (i.e. will the kernel execute on the CPU or CUDA) that will be applied to each embedding table.

EmbeddingLocation specifies the target memory location of the embedding tables that are constructed by the operator (i.e. on-CUDA-device-only, managed(UVM), managed + caching, on-host-only

device specifies the target location of memory buffers used internally by the operator.

There is a list of constraints being checked at runtime TBE construction, including:

ComputeDevice values should be the same across all embedding tables
EmbeddingLocation values can be different
Buf if use_cpu is set or optimizer is set to None, EmbeddingLocation values can only be set to HOST

If the constraints are not met, an error with detailed messages will be thrown, which will help guide you to create TBE with the correct combination of parameters.

We will update our docs to explain this more in details. Let us know if you have other questions. cc @sryap

JacoCheung · 2024-03-07T03:20:02Z

@q10 Thanks for your reply! What if the ComputeDevice==CPU while EmbeddingLocation==GPU? Will the kernel spawn some memcpy D2H?

q10 · 2024-03-07T04:53:55Z

@JacoCheung If ComputeDevice==CPU, then EmbeddingLocation must be CPU ; see https://github.com/pytorch/FBGEMM/blob/main/fbgemm_gpu/fbgemm_gpu/split_table_batched_embeddings_ops_training.py#L422

JacoCheung · 2024-03-07T05:20:40Z

Thanks @q10 . I have another question (but may be beyond this issue's scope).

How are the pyhsical tables allocated (Assume EmbeddingLocation are all device)? Do multiple tables share single memory chunk?

For example, I have 2 embedding tables with different embedding dimensions. Will there be 2 separate memory buffers or single? And how many lookup kernels will be launched while doing the forward?

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

[Question] What does device / embedding_specs.compute_device parameter in ctor of TBE mean? #2395

[Question] What does device / embedding_specs.compute_device parameter in ctor of TBE mean? #2395

JacoCheung commented Mar 6, 2024

q10 commented Mar 6, 2024

JacoCheung commented Mar 7, 2024

q10 commented Mar 7, 2024

JacoCheung commented Mar 7, 2024

[Question] What does device / embedding_specs.compute_device parameter in ctor of TBE mean? #2395

[Question] What does device / embedding_specs.compute_device parameter in ctor of TBE mean? #2395

Comments

JacoCheung commented Mar 6, 2024

q10 commented Mar 6, 2024

JacoCheung commented Mar 7, 2024

q10 commented Mar 7, 2024

JacoCheung commented Mar 7, 2024