Skip to content

Commit df20c15

Browse files
authored
reduce memory usage during repack (#168)
1 parent e52b6f8 commit df20c15

File tree

1 file changed

+1
-0
lines changed

1 file changed

+1
-0
lines changed

qllm/auto_model_quantization.py

Lines changed: 1 addition & 0 deletions
Original file line numberDiff line numberDiff line change
@@ -139,6 +139,7 @@ def repack_to_new_mode(self, model, new_pack_mode):
139139
new_module.bias = qlayer.bias if qlayer.bias is not None else None
140140
set_op_by_name(model, module_name, new_module)
141141
new_module.pack(qlayer, scales.T, zeros.T, qlayer.g_idx)
142+
del qlayer.weight
142143
qlayer.to('cpu')
143144
new_module.to('cpu')
144145
del qlayers

0 commit comments

Comments
 (0)