-
Notifications
You must be signed in to change notification settings - Fork 54
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
v1.30 Backports 2024-12-03 #1022
base: v1.30
Are you sure you want to change the base?
Conversation
e8071c1
to
7aa0126
Compare
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Need to get #1023 in as well
Yup, I will pick up the change after it is merged into main (just to get main commit sha) |
Merged :-) |
Picked it up 👍 |
[ upstream commit 4879439 ] Include also sni test not having "l7" in it's name, and check the logs for errors as well. Test with embedded cilium-envoy so that Envoy error and warning logs will also be checked for. Signed-off-by: Jarno Rajahalme <[email protected]> Signed-off-by: Tam Mach <[email protected]>
[ upstream commit db5924e ] Add a runtime assert that the policy destruction happens from the main or test thread. The "test thread" case also covers the destruction from the initial thread, which happens for the static members of 'AllowAllEgressPolicyInstanceImpl' when running `cilium-envoy version', for example. Signed-off-by: Jarno Rajahalme <[email protected]> Signed-off-by: Tam Mach <[email protected]>
[ upstream commit f09ed99 ] Remove the member 'initial_policy_' from the Cilium SocketOption class, as that reference was possibly kept after the policy had already been deleted. This made it possible for the policy to be destructed from the worker thread, which can lead to Envoy crash. 'initial_policy_' was stored as we already did the policy lookup and the same policy is needed in other Cilium filters. Have the cilium.network and cilium.tls_wrapper filters do their own policy lookups instead, so that we do not need to keep the reference in SocketOption. Worker threads do the policy lookup from their own thread local maps, which hold a reference to the policy. These thread local references are relinquished from post function during policy update, which will release worker thread's last reference to the policy at the time. The main thread is the last one to update or delete policy, so the policy instance destruction happens from the main thread. For this to work it is imperative to only hold policy references in these thread local maps. Note that even keeping a weak reference accessible to the worker threads outside of the policy maps is risky, as then the worker thread could convert that weak reference to a shared pointer while the reference has already been relinquished from the thread's local policy map. In this situation the other threads could also relinquish their references concurrently to the worker thread holding the shared pointer, which would then become the last reference and destruction would happen from the worker thread again. Signed-off-by: Jarno Rajahalme <[email protected]> Signed-off-by: Tam Mach <[email protected]>
[ upstream commit 7b1dc6c ] Log an error instead of crashing when Cilium NetworkPolicy resources are destructed in a worker thread. Remove stacktrace that is not useful at all with release builds. Prior to enabling use of SDS secrets running NetworkPolicy destructors in a worker thread did not cause visible problems, even though we had taken measures to not do that a long time ago. Now that references to the policy have been removed from the connection metadata (Cilium SocketOption) this should no longer trigger. Nonetheless, crashing Envoy is too drastic as the event may be survivable (as it apparently has been for a long time when running without SDS references in the policy). Enable trace level logging for troubleshooting if this error ever occurs. Signed-off-by: Jarno Rajahalme <[email protected]> Signed-off-by: Tam Mach <[email protected]>
[upstream commit 16e43f5] Signed-off-by: Jarno Rajahalme <[email protected]> Signed-off-by: Tam Mach <[email protected]>
cce5182
to
5373c55
Compare
Note: This backport PR consolidates all recent fixes related to SDS, we can push new commit if required
Once this PR is merged, a GitHub action will update the labels of these PRs: