Remove unused parameters before serializing #9659

GregoryComer · 2025-03-26T21:03:16Z

🐛 Describe the bug

Now that export switched to non-strict by default, unused parameters are left in the graph by default. This means that unquantized weights get serialized along with quantized weights, causing PTE size to bloat by 5x or more. We should strip out unused parameters somewhere in to_edge or to_executorch.

As a repro (requiring the latest PyTorch):

import torch

from executorch.backends.transforms.duplicate_dynamic_quant_chain import (
    DuplicateDynamicQuantChainPass,
)
from torch.ao.quantization.quantize_pt2e import convert_pt2e, prepare_pt2e
from torch.ao.quantization.quantizer.xnnpack_quantizer import (
    get_symmetric_quantization_config,
    XNNPACKQuantizer,
)
from torch.export import export, export_for_training, Dim
from executorch.backends.xnnpack.partition.xnnpack_partitioner import XnnpackPartitioner
from executorch.exir import to_edge_transform_and_lower

class SimpleModel(torch.nn.Module):
    def __init__(self):
        super().__init__()
        self.linear1 = torch.nn.Linear(16, 1024)
        self.relu1 = torch.nn.ReLU()
        self.linear2 = torch.nn.Linear(1024, 16)
        self.relu2 = torch.nn.ReLU()
    
    def forward(self, x):
        x = self.linear1(x)
        x = self.relu1(x)
        x = self.linear2(x)
        x = self.relu2(x)
        return x

model = SimpleModel()
inputs = (torch.randn(1, 16),)

pre_autograd_aten_dialect = torch.export.export_for_training(
    model,
    inputs,
).module()

quantizer = XNNPACKQuantizer()
#qparams = get_symmetric_quantization_config(is_dynamic=True, is_per_channel=True)
qparams = get_symmetric_quantization_config(is_per_channel=False)
quantizer.set_global(qparams)

prepared_graph = prepare_pt2e(pre_autograd_aten_dialect, quantizer)
prepared_graph.to("cpu")

converted_graph = convert_pt2e(prepared_graph)
DuplicateDynamicQuantChainPass()(converted_graph)

ep = export(converted_graph, inputs, strict=False)
lowered = to_edge_transform_and_lower(
    ep,
    partitioner=[XnnpackPartitioner()]
)

When printing the lowered program, note the extra unused f32 weights. You can also observe the PTE size is much larger than expected. Specifically, p_linear1_weight and p_linear2_weight are the original (unquantized) f32 weights and are unused. There is a u8 copy of the weights which is consumed by the delegate as expected.

print(lowered.exported_program())

ExportedProgram:
    class GraphModule(torch.nn.Module):
        def forward(self, p_linear1_weight: "f32[1024, 16]", p_linear2_weight: "f32[16, 1024]", x: "f32[1, 16]"):
            # No stacktrace found for following nodes
            lowered_module_0 = self.lowered_module_0
            executorch_call_delegate = torch.ops.higher_order.executorch_call_delegate(lowered_module_0, x);  lowered_module_0 = x = None
            getitem: "f32[1, 16]" = executorch_call_delegate[0];  executorch_call_delegate = None
            return (getitem,)
            
Graph signature: ExportGraphSignature(input_specs=[InputSpec(kind=<InputKind.PARAMETER: 2>, arg=TensorArgument(name='p_linear1_weight'), target='linear1.weight', persistent=None), InputSpec(kind=<InputKind.PARAMETER: 2>, arg=TensorArgument(name='p_linear2_weight'), target='linear2.weight', persistent=None), InputSpec(kind=<InputKind.USER_INPUT: 1>, arg=TensorArgument(name='x'), target=None, persistent=None)], output_specs=[OutputSpec(kind=<OutputKind.USER_OUTPUT: 1>, arg=TensorArgument(name='getitem'), target=None)])
Range constraints: {}

Versions

All

cc @larryliu0820 @JacobSzwejbka

The text was updated successfully, but these errors were encountered:

GregoryComer added module: runtime Issues related to the core runtime and code under runtime/ triaged This issue has been looked at a team member, and triaged and prioritized into an appropriate module labels Mar 26, 2025

github-project-automation bot added this to ExecuTorch Core Mar 26, 2025

github-project-automation bot moved this to To triage in ExecuTorch Core Mar 26, 2025

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Remove unused parameters before serializing #9659

Remove unused parameters before serializing #9659

GregoryComer commented Mar 26, 2025 •

edited

Loading

Remove unused parameters before serializing #9659

Remove unused parameters before serializing #9659

Comments

GregoryComer commented Mar 26, 2025 • edited Loading

🐛 Describe the bug

Versions

GregoryComer commented Mar 26, 2025 •

edited

Loading