Background
#624 Added support for distributed weight compression which parallelizes weight compression for distributed workflows. However, weight decompression has not been parallelized, meaning that it can take a long time to decompress a model during transformers inference or use cases where a user wants to decompress a model.
Requested Changes
Implement distributed decompression (ModelCompressor.decompress_model). This involves supporting BaseCompressor.decompress_module(module) on modules whose parameters are on the meta device for all subclasses of BaseCompressor.
Please add tests to verify that BaseCompressor.decompress_module and BaseCompressor.compress_module work for meta modules for all subclasses.
Background
#624 Added support for distributed weight compression which parallelizes weight compression for distributed workflows. However, weight decompression has not been parallelized, meaning that it can take a long time to decompress a model during transformers inference or use cases where a user wants to decompress a model.
Requested Changes
Implement distributed decompression (
ModelCompressor.decompress_model). This involves supportingBaseCompressor.decompress_module(module)on modules whose parameters are on the meta device for all subclasses ofBaseCompressor.Please add tests to verify that
BaseCompressor.decompress_moduleandBaseCompressor.compress_modulework for meta modules for all subclasses.