How to Split AWQ Weights? #626

Azure-Tang · 2024-09-28T07:22:54Z

Body:
Hello,

I am currently working on implementing tensor parallelism and need some guidance on how to split AWQ weights properly. Here's the current state of the AWQ weights I'm working with:

print("Qweight Shape:", self.qweight.shape)  # torch.Size([3584, 4096])
print("Scales Shape:", self.scales.shape)    # torch.Size([32, 14336])
print("Scaled Zeros Shape:", self.scaled_zeros.shape)  # torch.Size([32, 14336])

To split the weights, I used the following approach:

qweight_left = self.qweight[:1792, :]
scales_left = self.scales[:, :7168]
scaled_zeros_left = self.scaled_zeros[:, :7168]

I also created a random input of shape (1, 2048, 4096) and performed a matrix multiplication with both the original and the split weights. However, the results do not match:

>>> torch.allclose(out_left, out[:,:,:7168])
False

Could someone advise on how to correctly split the AWQ weights to achieve effective tensor parallelism? Any help or suggestions would be greatly appreciated!

Thank you!

Azure-Tang changed the title ~~Implementing Tensor Parallel by Splitting AWQ Weights in PyTorch~~ How to Split AWQ Weights? Sep 28, 2024

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

How to Split AWQ Weights? #626

How to Split AWQ Weights? #626

Azure-Tang commented Sep 28, 2024

How to Split AWQ Weights? #626

How to Split AWQ Weights? #626

Comments

Azure-Tang commented Sep 28, 2024