[Kernel] add bfloat16 support for gptq kernel #4781

jinzhen-lin · 2024-05-13T08:58:19Z

Some models would overflow when using fp16 inference (e.g. Deepseek-V2), so we should add bfloat16 support for quantization kernel. This PR add bfloat16 support for gptq kernel.

Related issue: #2149

main changes:

add bfloat16 input/output support for cuda kernels
dequant qweight to bfloat16 in proper ways.

NOTE: Currently, bfloat16 kernel may be much slower than float16 on >=sm80,<sm90 device since the support for atomicAdd with bfloat16 is not native (see description of atomicAdd). Increase the value of BLOCK_KN_SIZE can much improve the performance, but I don't sure if this will affect other situations.

mgoin · 2024-05-21T09:27:00Z

@alexm-neuralmagic should we support bfloat16 here the same way as done for gptq_marlin?

jinzhen-lin · 2024-05-22T02:48:48Z

@alexm-neuralmagic should we support bfloat16 here the same way as done for gptq_marlin?

The bfloat16 gptq kernel have serious performance issue now, I may optimize it recently.

github-actions · 2024-10-27T02:06:28Z

This pull request has been automatically marked as stale because it has not had any activity within 90 days. It will be automatically closed if no further activity occurs within 30 days. Leave a comment if you feel this pull request should remain open. Thank you!

mgoin · 2024-10-28T14:53:00Z

This has been resolved long ago on SM >= 80 since the GPTQ Marlin kernels support bfloat16, closing as resolved

vllm/vllm/model_executor/layers/quantization/gptq_marlin.py

Lines 74 to 76 in 2adb440

    
           @classmethod 
        
           def get_supported_act_dtypes(cls) -> List[torch.dtype]: 
        
               return [torch.half, torch.bfloat16]

jinzhen-lin added 2 commits May 13, 2024 16:48

add bfloat16 support for gptq

b24457e

Merge branch 'vllm-project:main' into gptq-bf16

a524167

jinzhen-lin mentioned this pull request May 13, 2024

[Kernel] add bfloat16 support for gptq marlin kernel #4788

Merged

github-actions bot added the stale Over 90 days of inactivity label Oct 27, 2024

mgoin closed this Oct 28, 2024

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Uh oh!

Uh oh!

[Kernel] add bfloat16 support for gptq kernel #4781

[Kernel] add bfloat16 support for gptq kernel #4781

Uh oh!

jinzhen-lin commented May 13, 2024

Uh oh!

mgoin commented May 21, 2024

Uh oh!

jinzhen-lin commented May 22, 2024

Uh oh!

github-actions bot commented Oct 27, 2024

Uh oh!

mgoin commented Oct 28, 2024

Uh oh!

Uh oh!

Uh oh!

[Kernel] add bfloat16 support for gptq kernel #4781

[Kernel] add bfloat16 support for gptq kernel #4781

Uh oh!

Conversation

jinzhen-lin commented May 13, 2024

Uh oh!

mgoin commented May 21, 2024

Uh oh!

jinzhen-lin commented May 22, 2024

Uh oh!

github-actions bot commented Oct 27, 2024

Uh oh!

mgoin commented Oct 28, 2024

Uh oh!

Uh oh!