Skip to content

[Feature] Distributed model decompression #671

Description

@kylesayrs

Background

#624 Added support for distributed weight compression which parallelizes weight compression for distributed workflows. However, weight decompression has not been parallelized, meaning that it can take a long time to decompress a model during transformers inference or use cases where a user wants to decompress a model.

Requested Changes

Implement distributed decompression (ModelCompressor.decompress_model). This involves supporting BaseCompressor.decompress_module(module) on modules whose parameters are on the meta device for all subclasses of BaseCompressor.

Please add tests to verify that BaseCompressor.decompress_module and BaseCompressor.compress_module work for meta modules for all subclasses.

Metadata

Metadata

Labels

No labels
No labels

Type

No type
No fields configured for issues without a type.

Projects

No projects

Milestone

No milestone

Relationships

None yet

Development

No branches or pull requests

Issue actions