2x slowdown using TP #117

jph00 · 2023-08-26T18:04:20Z

Thanks for this very interesting lib @BlackSamorez! I just tried running your kaggle nb locally, using 2 x A6000s. I used the 'meta-llama/Llama-2-7b-hf' model. The call to model.generate with max_length=200 takes 12 seconds when using tensor_parallel with the 2 GPUs.

However, if I remove tensor_parallel and instead just use a single GPU, generation is over twice as fast, taking 5 seconds.

Is this slowdown expected, or am I doing something wrong?

The text was updated successfully, but these errors were encountered:

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

2x slowdown using TP #117

2x slowdown using TP #117

jph00 commented Aug 26, 2023

2x slowdown using TP #117

2x slowdown using TP #117

Comments

jph00 commented Aug 26, 2023