Remove unused parameters before serializing #9659
Labels
module: runtime
Issues related to the core runtime and code under runtime/
triaged
This issue has been looked at a team member, and triaged and prioritized into an appropriate module
🐛 Describe the bug
Now that export switched to non-strict by default, unused parameters are left in the graph by default. This means that unquantized weights get serialized along with quantized weights, causing PTE size to bloat by 5x or more. We should strip out unused parameters somewhere in to_edge or to_executorch.
As a repro (requiring the latest PyTorch):
When printing the lowered program, note the extra unused f32 weights. You can also observe the PTE size is much larger than expected. Specifically, p_linear1_weight and p_linear2_weight are the original (unquantized) f32 weights and are unused. There is a u8 copy of the weights which is consumed by the delegate as expected.
Versions
All
cc @larryliu0820 @JacobSzwejbka
The text was updated successfully, but these errors were encountered: