Feat (equalize): enable parametrized scales #1175

pablomlago · 2025-02-05T18:35:18Z

Reason for this PR

Enable parametrized scaling, similarly to rotations (see #1148).

Changes Made in this PR

Refactored the function _cross_layer_equalization and incorporated rewriters for handling fused/unfused scaling.

Testing Summary

Made equalization tests also run with fuse_scaling=False.

Risk Highlight

This PR includes code from another work (please detail).
This PR contains API-breaking changes.
This PR depends on work in another PR (please provide links/details).
This PR introduces new dependencies (please detail).
There are coverage gaps not covered by tests.
Documentation updates required in subsequent PR.

Checklist

Code comments added to any hard-to-understand areas, if applicable.
Changes generate no new warnings.
Updated any relevant tests, if applicable.
No conflicts with destination dev branch.
I reviewed my own code changes.
Initial CI/CD passing.
1+ reviews given, and any review issues addressed and approved.
Post-review full CI/CD passing.

Giuseppe5 · 2025-02-12T14:35:42Z

src/brevitas/utils/torch_utils.py

+
+
+def update_module_tensor(module: nn.Module, tensor: torch.Tensor, tensor_name: str):
+    setattr(module, tensor_name, torch.nn.Parameter(tensor))


Not convinced by this. Either the name of the function must be changed, or we need to control if the new tensor is a parameter or not. Also, how general is this to have it here?

I'm fine with removing it, it is mainly a leftover of the WeightBiasWrapper.

Giuseppe5 · 2025-02-12T14:36:28Z

src/brevitas/graph/base.py

@@ -308,7 +309,7 @@ def apply(self, model: GraphModule) -> GraphModule:
                tensor = getattr(module, self.tensor_name).data
                tensor = self.transform_module(tensor)
                # Modify the weights in-place
-                setattr(module, self.tensor_name, torch.nn.Parameter(tensor))
+                update_module_tensor(module=module, tensor=tensor, tensor_name=self.tensor_name)


looks like we only do it once, so let's not create a new func

Giuseppe5 · 2025-02-12T14:40:14Z

src/brevitas/graph/equalize.py

    """
    Given two adjacent tensors', the weights are scaled such that
    the ranges of the first tensors' output channel are equal to the
    ranges of the second tensors' input channel
    """
+    # The names of the attributes containing the tensors to equalize, as well as the axis


Why is this change needed?

I'm not sure I like this solution better than the previous one. Let's see if we can find a compromise

The motivation was two-fold:

Remove the WeightBiasWrapper, whose only functionality was to make sure that weights are under the attribute "weight". In general, reducing, as much as possible, the places in which we do checks on the type of module to get the attribute of the weights (e.g. "in_proj_weight" in MHA)

Make the loop in which the parametrizations are added as similar as possible to that of the rotations: might make it easier to remove duplications in future PRs.
That being said, I'm not a big fan of having those constants either, so I'm open to any proposal.

Can't we expand the WeightBiasWrapper class to also keep track of the original tensor name?
So we can keep doing module.weight to get the weights (instead of having weight_pos and bias_pos), and the name/axis, all in their own attribute

I've expanded the "wrapper" to handle additional logic, so it is more meaningful to keep it.

Giuseppe5 · 2025-02-12T14:41:44Z

src/brevitas/graph/equalize.py

@@ -500,20 +528,35 @@ def _no_equalize():

        if isinstance(module, nn.MultiheadAttention):
            module = module.out_proj
-        src_axes[name] = (module, axis)
+        # Bias, if present, needs to be rotated for sources


Giuseppe5 · 2025-02-12T14:56:50Z

src/brevitas/graph/equalize.py

@@ -1021,6 +1064,29 @@ def apply(self,
            return graph_model


+class ScaleBiasMul(nn.Module):


Not sure if I want to have a new class for what's basically a 99% overlap with ScaledBias

If we switch scaling with inverse_scaling, i.e.

scaling_factors = sinks_range / srcs_range

We only need the reciprocal for the weights, and maybe we can get rid of this class?

I wanted to make sure that the order of the operations was the same as before, but given that. empirically, the change in the output is negligible, I'll do that change to remove the ScaleBiasMul module.

Giuseppe5 · 2025-02-12T14:58:36Z

src/brevitas/utils/parametrization_utils.py

+        self.axis = axis
+        self.start_end_idxs = start_end_idxs
+        self.slice_idxs = slice_idxs
+        self.use_inverse_scaling = use_inverse_scaling


Why do we need to have the inverse both here and for the activations?
Only one of the two needs the inverse

It is needed in weight equalization: sources are scaled by the scaling factor, and sinks by its inverse.

Giuseppe5 · 2025-02-12T15:04:30Z

src/brevitas/graph/equalize.py

-        sink_broadcast_size = [1] * module.weight.ndim
-        sink_broadcast_size[axis] = module.weight.size(axis)
+            insert_mul_node_fn(scaling_factors, act_val_shape, act_axis)
+    for name, (module, tensor_names_axis) in src_axes.items():


What happens in weight equalization with multiple iterations?

Parametrizations are added on top of each other.

Didn't we discuss to update the parametrization? In case of big models with 100 iterations, does it mean we will have 100 scales per weight?

It's unlikely that weight equalization will be used with parametrized scaling at the moment. If needed, some logic will be incorporated in the future to fuse the scaling parameters appropriately, so there's only a scaling factor Parameter per region, irrespective of the number of iterations.

pablomlago changed the base branch from master to dev February 5, 2025 18:35

pablomlago changed the title ~~Feat(equalize): enable parametrized scales~~ Feat (equalize): enable parametrized scales Feb 5, 2025

pablomlago requested a review from Giuseppe5 February 10, 2025 11:50

pablomlago force-pushed the feat-param-scales branch from 20837b7 to 99bbb4f Compare February 12, 2025 14:51

Giuseppe5 requested changes Feb 12, 2025

View reviewed changes

pablomlago requested a review from Giuseppe5 February 12, 2025 16:52

pablomlago force-pushed the feat-param-scales branch from 021f2e9 to 19fac39 Compare February 12, 2025 18:27

pablomlago added 19 commits February 14, 2025 10:59

Minor changes

c7cddf0

feat-param-scales

66283d6

Add test for legacy

600762f

Refactor logic into scale module forward

4a8ad0e

Small refactor create_mul_node

7e9a5df

Remove legacy code

8e44bc7

Tests with unfused parametrizations

c081d8a

Revert removal of move to cpu

bffa277

Rename file

0b90648

Revert rename to show in git

6275dbd

Remove weight bias wrapper

602617c

Remove test

064c5f7

Minor addition to test

7ae6348

Change to be reverted

8023768

Add no_grad marker to equalization

d65eea6

Explain wrapper

eea2640

Address PR changes

710d573

Run pre-commit

d8bc4da

Add module wrapper

e1e489c

pablomlago force-pushed the feat-param-scales branch from b597820 to e1e489c Compare February 14, 2025 11:01

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Feat (equalize): enable parametrized scales #1175

Feat (equalize): enable parametrized scales #1175

pablomlago commented Feb 5, 2025 •

edited

Loading

Giuseppe5 Feb 12, 2025

pablomlago Feb 12, 2025

Giuseppe5 Feb 12, 2025

Giuseppe5 Feb 12, 2025

Giuseppe5 Feb 12, 2025

pablomlago Feb 12, 2025

Giuseppe5 Feb 13, 2025

pablomlago Feb 14, 2025

Giuseppe5 Feb 12, 2025

pablomlago Feb 12, 2025

Giuseppe5 Feb 12, 2025

Giuseppe5 Feb 12, 2025

pablomlago Feb 12, 2025

Giuseppe5 Feb 12, 2025

pablomlago Feb 12, 2025

Giuseppe5 Feb 12, 2025

pablomlago Feb 12, 2025

Giuseppe5 Feb 13, 2025

pablomlago Feb 14, 2025



		def update_module_tensor(module: nn.Module, tensor: torch.Tensor, tensor_name: str):
		setattr(module, tensor_name, torch.nn.Parameter(tensor))

		@@ -1021,6 +1064,29 @@ def apply(self,
		return graph_model


		class ScaleBiasMul(nn.Module):

Feat (equalize): enable parametrized scales #1175

Are you sure you want to change the base?

Feat (equalize): enable parametrized scales #1175

Conversation

pablomlago commented Feb 5, 2025 • edited Loading

Reason for this PR

Changes Made in this PR

Testing Summary

Risk Highlight

Checklist

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

pablomlago commented Feb 5, 2025 •

edited

Loading