Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

perf(rules): add withNegation flag to simplify to policy flow #13151

Merged
merged 15 commits into from
Mar 24, 2025

Conversation

lukidzi
Copy link
Contributor

@lukidzi lukidzi commented Mar 20, 2025

Motivation

Note

This is not a problem when using meshServices.mode: Exclusive with policies in a new style

When a user has multiple to policies for a single top-level target reference and a default to Mesh target, CPU usage spikes due to the expensive matching process.
Screenshot 2025-03-20 at 18 43 36

Implementation information

After analyzing the issue with @lobkovilya, we discovered that for to policies, our logic checks every permutation of tags. However, this is unnecessary because the only possible tags for to policies are kuma.io/service, or no tags at all when the target is Mesh.

The only exception is MeshHTTPRoute, where we generate the __rule-matches-hash__ tag, which points to a hash of the matcher. Since we don’t need to evaluate all permutations, we decided to iterate through the subsets and match them directly with the policies. To optimize further, we first deduplicate entries to avoid redundant computations.

After implementing these changes, I deployed the update and ran a performance profile, which showed significant improvements.

image

fix: #13149

lobkovilya and others added 2 commits March 20, 2025 15:21
Signed-off-by: Lukasz Dziedziak <[email protected]>
@lukidzi lukidzi changed the title fix(rules): add 'withNegation' flag to simplify to policy flow fix(rules): add withNegation flag to simplify to policy flow Mar 20, 2025
Copy link
Contributor

Reviewer Checklist

🔍 Each of these sections need to be checked by the reviewer of the PR 🔍:
If something doesn't apply please check the box and add a justification if the reason is non obvious.

  • Is the PR title satisfactory? Is this part of a larger feature and should be grouped using > Changelog?
  • PR description is clear and complete. It Links to relevant issue as well as docs and UI issues
  • This will not break child repos: it doesn't hardcode values (.e.g "kumahq" as an image registry)
  • IPv6 is taken into account (.e.g: no string concatenation of host port)
  • Tests (Unit test, E2E tests, manual test on universal and k8s)
    • Don't forget ci/ labels to run additional/fewer tests
  • Does this contain a change that needs to be notified to users? In this case, UPGRADE.md should be updated.
  • Does it need to be backported according to the backporting policy? (this GH action will add "backport" label based on these file globs, if you want to prevent it from adding the "backport" label use no-backport-autolabel label)

Signed-off-by: Lukasz Dziedziak <[email protected]>
@lukidzi lukidzi self-assigned this Mar 20, 2025
lukidzi added 2 commits March 20, 2025 18:26
Signed-off-by: Lukasz Dziedziak <[email protected]>
Signed-off-by: Lukasz Dziedziak <[email protected]>
@lukidzi lukidzi added the ci/run-full-matrix PR: Runs all possible e2e test combination (expensive use carefully) label Mar 21, 2025
lukidzi added 2 commits March 20, 2025 19:02
Signed-off-by: Lukasz Dziedziak <[email protected]>
Signed-off-by: Lukasz Dziedziak <[email protected]>
@lukidzi lukidzi marked this pull request as ready for review March 21, 2025 00:20
@lukidzi lukidzi requested a review from a team as a code owner March 21, 2025 00:20
@lukidzi lukidzi requested review from Automaat and Icarus9913 March 21, 2025 00:20
lukidzi added 2 commits March 20, 2025 21:55
Signed-off-by: Lukasz Dziedziak <[email protected]>
Signed-off-by: Lukasz Dziedziak <[email protected]>
Signed-off-by: Lukasz Dziedziak <[email protected]>
@lukidzi lukidzi requested a review from lobkovilya March 21, 2025 10:41
lukidzi added 2 commits March 21, 2025 09:16
Signed-off-by: Lukasz Dziedziak <[email protected]>
Signed-off-by: Lukasz Dziedziak <[email protected]>
@lukidzi lukidzi changed the title fix(rules): add withNegation flag to simplify to policy flow perf(rules): add withNegation flag to simplify to policy flow Mar 21, 2025
Signed-off-by: Lukasz Dziedziak <[email protected]>
@lukidzi lukidzi requested a review from lobkovilya March 21, 2025 15:31
Signed-off-by: Lukasz Dziedziak <[email protected]>
@lukidzi lukidzi requested a review from lobkovilya March 24, 2025 17:38
@lukidzi lukidzi removed the ci/run-full-matrix PR: Runs all possible e2e test combination (expensive use carefully) label Mar 24, 2025
@lukidzi lukidzi enabled auto-merge (squash) March 24, 2025 21:35
@lukidzi lukidzi merged commit c3781d4 into master Mar 24, 2025
13 checks passed
@lukidzi lukidzi deleted the fix/simplify-rules-for-outbounds branch March 24, 2025 22:04
kumahq bot pushed a commit that referenced this pull request Mar 25, 2025
…3151)

## Motivation

> [!NOTE] 
> This is not a problem when using `meshServices.mode: Exclusive` with
policies in a new style

When a user has multiple `to` policies for a single top-level target
reference and a default to Mesh target, CPU usage spikes due to the
expensive matching process.
<img width="2053" alt="Screenshot 2025-03-20 at 18 43 36"
src="https://github.com/user-attachments/assets/259af8fb-357f-4cfe-ae54-257ce0cd36d9"
/>

## Implementation information

After analyzing the issue with @lobkovilya, we discovered that for `to`
policies, our logic checks every permutation of tags. However, this is
unnecessary because the only possible tags for `to` policies are
`kuma.io/service`, or no tags at all when the target is `Mesh`.

The only exception is `MeshHTTPRoute`, where we generate the
`__rule-matches-hash__` tag, which points to a hash of the matcher.
Since we don’t need to evaluate all permutations, we decided to iterate
through the subsets and match them directly with the policies. To
optimize further, we first deduplicate entries to avoid redundant
computations.

After implementing these changes, I deployed the update and ran a
performance profile, which showed significant improvements.


![image](https://github.com/user-attachments/assets/db30fc2e-aafe-4e16-a8ef-df8a9401bd25)


fix: #13149

---------

Signed-off-by: Ilya Lobkov <[email protected]>
Signed-off-by: Lukasz Dziedziak <[email protected]>
Co-authored-by: Ilya Lobkov <[email protected]>
kumahq bot pushed a commit that referenced this pull request Mar 25, 2025
…3151)

> [!NOTE]
> This is not a problem when using `meshServices.mode: Exclusive` with
policies in a new style

When a user has multiple `to` policies for a single top-level target
reference and a default to Mesh target, CPU usage spikes due to the
expensive matching process.
<img width="2053" alt="Screenshot 2025-03-20 at 18 43 36"
src="https://github.com/user-attachments/assets/259af8fb-357f-4cfe-ae54-257ce0cd36d9"
/>

After analyzing the issue with @lobkovilya, we discovered that for `to`
policies, our logic checks every permutation of tags. However, this is
unnecessary because the only possible tags for `to` policies are
`kuma.io/service`, or no tags at all when the target is `Mesh`.

The only exception is `MeshHTTPRoute`, where we generate the
`__rule-matches-hash__` tag, which points to a hash of the matcher.
Since we don’t need to evaluate all permutations, we decided to iterate
through the subsets and match them directly with the policies. To
optimize further, we first deduplicate entries to avoid redundant
computations.

After implementing these changes, I deployed the update and ran a
performance profile, which showed significant improvements.

![image](https://github.com/user-attachments/assets/db30fc2e-aafe-4e16-a8ef-df8a9401bd25)

fix: #13149

---------

Signed-off-by: Ilya Lobkov <[email protected]>
Signed-off-by: Lukasz Dziedziak <[email protected]>
Co-authored-by: Ilya Lobkov <[email protected]>
kumahq bot pushed a commit that referenced this pull request Mar 25, 2025
…3151)

> [!NOTE]
> This is not a problem when using `meshServices.mode: Exclusive` with
policies in a new style

When a user has multiple `to` policies for a single top-level target
reference and a default to Mesh target, CPU usage spikes due to the
expensive matching process.
<img width="2053" alt="Screenshot 2025-03-20 at 18 43 36"
src="https://github.com/user-attachments/assets/259af8fb-357f-4cfe-ae54-257ce0cd36d9"
/>

After analyzing the issue with @lobkovilya, we discovered that for `to`
policies, our logic checks every permutation of tags. However, this is
unnecessary because the only possible tags for `to` policies are
`kuma.io/service`, or no tags at all when the target is `Mesh`.

The only exception is `MeshHTTPRoute`, where we generate the
`__rule-matches-hash__` tag, which points to a hash of the matcher.
Since we don’t need to evaluate all permutations, we decided to iterate
through the subsets and match them directly with the policies. To
optimize further, we first deduplicate entries to avoid redundant
computations.

After implementing these changes, I deployed the update and ran a
performance profile, which showed significant improvements.

![image](https://github.com/user-attachments/assets/db30fc2e-aafe-4e16-a8ef-df8a9401bd25)

fix: #13149

---------

Signed-off-by: Ilya Lobkov <[email protected]>
Signed-off-by: Lukasz Dziedziak <[email protected]>
Co-authored-by: Ilya Lobkov <[email protected]>
kumahq bot pushed a commit that referenced this pull request Mar 25, 2025
…3151)

> [!NOTE]
> This is not a problem when using `meshServices.mode: Exclusive` with
policies in a new style

When a user has multiple `to` policies for a single top-level target
reference and a default to Mesh target, CPU usage spikes due to the
expensive matching process.
<img width="2053" alt="Screenshot 2025-03-20 at 18 43 36"
src="https://github.com/user-attachments/assets/259af8fb-357f-4cfe-ae54-257ce0cd36d9"
/>

After analyzing the issue with @lobkovilya, we discovered that for `to`
policies, our logic checks every permutation of tags. However, this is
unnecessary because the only possible tags for `to` policies are
`kuma.io/service`, or no tags at all when the target is `Mesh`.

The only exception is `MeshHTTPRoute`, where we generate the
`__rule-matches-hash__` tag, which points to a hash of the matcher.
Since we don’t need to evaluate all permutations, we decided to iterate
through the subsets and match them directly with the policies. To
optimize further, we first deduplicate entries to avoid redundant
computations.

After implementing these changes, I deployed the update and ran a
performance profile, which showed significant improvements.

![image](https://github.com/user-attachments/assets/db30fc2e-aafe-4e16-a8ef-df8a9401bd25)

fix: #13149

---------

Signed-off-by: Ilya Lobkov <[email protected]>
Signed-off-by: Lukasz Dziedziak <[email protected]>
Co-authored-by: Ilya Lobkov <[email protected]>
lukidzi added a commit that referenced this pull request Mar 25, 2025
…ckport of #13151) (#13193)

Automatic cherry-pick of #13151 for branch release-2.10

Generated by
[action](https://github.com/kumahq/kuma/actions/runs/14056954185)

cherry-picked commit c3781d4

Signed-off-by: Ilya Lobkov <[email protected]>
Signed-off-by: Lukasz Dziedziak <[email protected]>
Co-authored-by: Lukasz Dziedziak <[email protected]>
Co-authored-by: Ilya Lobkov <[email protected]>
bartsmykla pushed a commit that referenced this pull request Mar 26, 2025
…ckport of #13151) (#13196)

Automatic cherry-pick of #13151 for branch release-2.7

Generated by
[action](https://github.com/kumahq/kuma/actions/runs/14056954185)

cherry-picked commit c3781d4

⚠️ ⚠️ ⚠️ Conflicts happened when cherry-picking!
⚠️ ⚠️ ⚠️
```
On branch release-2.7
Your branch is up to date with 'origin/release-2.7'.

You are currently cherry-picking commit c3781d4.
  (fix conflicts and run "git cherry-pick --continue")
  (use "git cherry-pick --skip" to skip this patch)
  (use "git cherry-pick --abort" to cancel the cherry-pick operation)

Changes to be committed:
	modified:   pkg/api-server/testdata/resources/inspect/dataplanes/_rules/overriding_meshtimeout.golden.json
	modified:   pkg/plugins/policies/core/matchers/egress.go
	modified:   pkg/plugins/policies/core/matchers/testdata/matchedpolicies/torules/03.policies.yaml
	modified:   pkg/plugins/policies/core/rules/testdata/rules/to/meshtimeout.golden.yaml
	modified:   pkg/plugins/policies/core/rules/testdata/rules/to/single-to.golden.yaml

Unmerged paths:
  (use "git add/rm <file>..." as appropriate to mark resolution)
	both modified:   pkg/api-server/testdata/resources/inspect/dataplanes/_rules/meshhttproute.golden.json
	deleted by us:   pkg/api-server/testdata/resources/inspect/dataplanes/_rules/resource_rule_meshtimeout_index.golden.json
	both modified:   pkg/plugins/policies/core/matchers/testdata/matchedpolicies/torules/03.golden.yaml
	both modified:   pkg/plugins/policies/core/rules/rules.go
	deleted by us:   pkg/plugins/policies/core/rules/subsetutils/subset.go

```

---------

Signed-off-by: Ilya Lobkov <[email protected]>
Signed-off-by: Lukasz Dziedziak <[email protected]>
Co-authored-by: Lukasz Dziedziak <[email protected]>
Co-authored-by: Ilya Lobkov <[email protected]>
bartsmykla pushed a commit that referenced this pull request Mar 26, 2025
…ckport of #13151) (#13195)

Automatic cherry-pick of #13151 for branch release-2.8

Generated by
[action](https://github.com/kumahq/kuma/actions/runs/14056954185)

cherry-picked commit c3781d4

⚠️ ⚠️ ⚠️ Conflicts happened when cherry-picking!
⚠️ ⚠️ ⚠️
```
On branch release-2.8
Your branch is up to date with 'origin/release-2.8'.

You are currently cherry-picking commit c3781d4.
  (fix conflicts and run "git cherry-pick --continue")
  (use "git cherry-pick --skip" to skip this patch)
  (use "git cherry-pick --abort" to cancel the cherry-pick operation)

Changes to be committed:
	modified:   pkg/api-server/testdata/resources/inspect/dataplanes/_rules/overriding_meshtimeout.golden.json
	modified:   pkg/plugins/policies/core/matchers/egress.go
	modified:   pkg/plugins/policies/core/matchers/testdata/matchedpolicies/torules/03.policies.yaml
	modified:   pkg/plugins/policies/core/rules/testdata/rules/to/meshtimeout.golden.yaml
	modified:   pkg/plugins/policies/core/rules/testdata/rules/to/single-to.golden.yaml

Unmerged paths:
  (use "git add/rm <file>..." as appropriate to mark resolution)
	both modified:   pkg/api-server/testdata/resources/inspect/dataplanes/_rules/meshhttproute.golden.json
	deleted by us:   pkg/api-server/testdata/resources/inspect/dataplanes/_rules/resource_rule_meshtimeout_index.golden.json
	both modified:   pkg/plugins/policies/core/matchers/testdata/matchedpolicies/torules/03.golden.yaml
	both modified:   pkg/plugins/policies/core/rules/rules.go
	deleted by us:   pkg/plugins/policies/core/rules/subsetutils/subset.go

```

---------

Signed-off-by: Ilya Lobkov <[email protected]>
Signed-off-by: Lukasz Dziedziak <[email protected]>
Co-authored-by: Lukasz Dziedziak <[email protected]>
Co-authored-by: Ilya Lobkov <[email protected]>
bartsmykla pushed a commit that referenced this pull request Mar 26, 2025
…ckport of #13151) (#13194)

Automatic cherry-pick of #13151 for branch release-2.9

Generated by
[action](https://github.com/kumahq/kuma/actions/runs/14056954185)

cherry-picked commit c3781d4

⚠️ ⚠️ ⚠️ Conflicts happened when cherry-picking!
⚠️ ⚠️ ⚠️
```
On branch release-2.9
Your branch is up to date with 'origin/release-2.9'.

You are currently cherry-picking commit c3781d4.
  (fix conflicts and run "git cherry-pick --continue")
  (use "git cherry-pick --skip" to skip this patch)
  (use "git cherry-pick --abort" to cancel the cherry-pick operation)

Changes to be committed:
	modified:   pkg/api-server/testdata/resources/inspect/dataplanes/_rules/overriding_meshtimeout.golden.json
	modified:   pkg/api-server/testdata/resources/inspect/dataplanes/_rules/resource_rule_meshtimeout_index.golden.json
	modified:   pkg/plugins/policies/core/matchers/egress.go
	modified:   pkg/plugins/policies/core/matchers/testdata/matchedpolicies/torules/03.golden.yaml
	modified:   pkg/plugins/policies/core/matchers/testdata/matchedpolicies/torules/03.policies.yaml
	modified:   pkg/plugins/policies/core/rules/testdata/rules/to/meshtimeout.golden.yaml
	modified:   pkg/plugins/policies/core/rules/testdata/rules/to/single-to.golden.yaml

Unmerged paths:
  (use "git add/rm <file>..." as appropriate to mark resolution)
	both modified:   pkg/api-server/testdata/resources/inspect/dataplanes/_rules/meshhttproute.golden.json
	both modified:   pkg/plugins/policies/core/rules/rules.go
	deleted by us:   pkg/plugins/policies/core/rules/subsetutils/subset.go

```

---------

Signed-off-by: Ilya Lobkov <[email protected]>
Signed-off-by: Lukasz Dziedziak <[email protected]>
Co-authored-by: Lukasz Dziedziak <[email protected]>
Co-authored-by: Ilya Lobkov <[email protected]>
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

High CPU usage when using multiple to policies
2 participants