Open
Conversation
ced9131 to
2be16a3
Compare
17abc04 to
33be313
Compare
✅ Deploy Preview for tetragon ready!
To edit notification comments on pull requests, go to your Netlify project configuration. |
19ffe02 to
575ddd6
Compare
575ddd6 to
40b559c
Compare
This patch the policy_filter_check function to check for host workloads as well. In order to make that work, this patch assumes that one more entry in policy_filter_maps with key equals to ALL_PODS_POLICY_ID (UINT32_MAX) exists. The inner map of that entry should contain all cgroup_ids of all pods in the system. If a cgroup_id is part of that set, this means that it is a pod workload. If not this means that this is part of the host workload. Additionally, it assumes that a policy contains an invalid cgroup_id entry with value HOST_SELECTOR_MODE (UINT64_MAX) if the policy itself cares about host workloads. The way that policy_filter_check works is the following: 1. check if our cgroup_id match on the policy build from podSelector/containerSelector (similar to before) 1.1. return true if we match 1.2. continue to check if we care about host workloads 2. check if the policy cares also for host workloads 2.1. return false if not 2.2. continue to check if our cgroup_id is a host workload 3. check if our cgroup_id is part of ALL_PODS_POLICY_ID 3.1. return true if not (this means that out cgroup_id is not a pod workload --> so this is a host workload) 3.2. return false if it is (this means that out cgroup_id is a pod workload) Next patches will add support for updating policy_filter_maps as previously described. Signed-off-by: Anastasios Papagiannis <anastasios.papagiannis@isovalent.com>
This patch adds hostSelector under tracing policy spec.
Expcept from that it also does two additional changes:
1. Allow the (i.e. podSelector, containerSelector, hostSelector) to be
null by adding the +nullable flag.
2. Make the default value to be {} instead of null.
By default a policy is similar to:
spec:
hostSelector: {}
podSelector: {}
containerSelector: {}
which means that it matches everything.
Then the user can add addiitonal filtering to those to remove uncessary
workloads from a policy.
Signed-off-by: Anastasios Papagiannis <anastasios.papagiannis@isovalent.com>
This patch updates policy_filter_maps to have: 1. One more entry with key equals to ALL_PODS_POLICY_ID (UINT32_MAX). The inner map of that entry contains all cgroup_ids of all pods in the system. If a cgroup_id is part of that set, this means that it is a pod workload. If not this means that this is part of the host workload. 2. If a policy cares about host workloads it contains an invalid cgroup_id entry with value HOST_SELECTOR_MODE (UINT64_MAX). A previous patch handles those inside eBPF to provide the desired behaviour. This approach adds one more entry that also increases the memory consumption by a small factor. This is not a big issue as if a single policy wants to monitor all pods, it will need the same amount of memory. For the podSelector this is needed only once. Signed-off-by: Anastasios Papagiannis <anastasios.papagiannis@isovalent.com>
The previous patch changes the contents of policy_filter_maps. This patch updates the tests to reflect those changes. Signed-off-by: Anastasios Papagiannis <anastasios.papagiannis@isovalent.com>
This patch adds tests for tracing policy with hostSelector. Signed-off-by: Anastasios Papagiannis <anastasios.papagiannis@isovalent.com>
In the hostSelector we have a race if we rely on the k8s API to update our maps. This is similar to the race that we have in podSelector as well. The fix for that issue is to use the Runtime Hooks. This race cause some flakes in the policyfilter e2e test. As the purpose of this test is not to test the hostSelector, we exclude host workloads to remove that flake. Signed-off-by: Anastasios Papagiannis <anastasios.papagiannis@isovalent.com>
Signed-off-by: Anastasios Papagiannis <anastasios.papagiannis@isovalent.com>
40b559c to
c6afde5
Compare
This file contains hidden or bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
Sign up for free
to join this conversation on GitHub.
Already have an account?
Sign in to comment
Add this suggestion to a batch that can be applied as a single commit.This suggestion is invalid because no changes were made to the code.Suggestions cannot be applied while the pull request is closed.Suggestions cannot be applied while viewing a subset of changes.Only one suggestion per line can be applied in a batch.Add this suggestion to a batch that can be applied as a single commit.Applying suggestions on deleted lines is not supported.You must change the existing code in this line in order to create a valid suggestion.Outdated suggestions cannot be applied.This suggestion has been applied or marked resolved.Suggestions cannot be applied from pending reviews.Suggestions cannot be applied on multi-line comments.Suggestions cannot be applied while the pull request is queued to merge.Suggestion cannot be applied right now. Please check back later.
This PR adds support for
spec.hostSelectorto enable users to write policies that are applied only to host workloads and/or pod workloads (with additional filtering for the containers if needed).The syntax is:
and the previous example will match all workloads either in the host or inside pods without any filtering. This is also the default (i.e. if the user does not define any of those in a tracing policy).
If we need to match only host workloads we need to have something like:
For now we only support 2 types of
hostSelector. These are the{}(everything) andnull(nothing). Specific filters are not supported yet.In order to make that work, we need to make some changes in the behaviour of the
podSelector. Currently,podSelector: {}andpodSelector: nullare exactly the same thing and will match all workloads (including host workloads). Based on this behaviour, now we don't have a way to match on all pod workloads (excluding host workloads).In this PR we change that behaviour. Now
podSelector: {}will match all pod workloads andpodSelector: nullwill not match any pod workloads. The behaviour ofcontainerSelectoris similar, but what actually does is a second level filtering on thepodSelector. For this reason:will not match anything as podSelector is null. We need to have something like
to filter containers.
This PR also provides the ability to make
hostSelector,podSelector, andcontainerSelectorequals tonullwhich was not possible before.Implementation
In order to make that work, this PR adds one more entry in
policy_filter_mapswith key equals toALL_PODS_POLICY_ID(UINT32_MAX). The inner map of that entry contains allcgroup_ids of all pods in the system. If acgroup_idis part of that set, this means that it is a pod workload. If not this means that this is part of the host workload. Additionally, each policy contains an invalidcgroup_identry with valueHOST_SELECTOR_MODE(UINT64_MAX) if the policy itself cares about host workloads.Then we update
policy_filter_checkto check those as well. More specifically what policy_filter_check does is:Any updates needed for the policy_filter_maps are handled from the agent in a similar way to podSelector/containerSelector.
Testing
All tests related to the policyfiler have been updated to reflect those changes. Some additional tests have been added to check the hostSelector explicitly.
Limitations
This approach has similar issues with Kubernetes Identity Aware Policies and more specifically if we rely on the k8s API server, there may be a race on when we add a cgroup_id to our maps and when the pod actually starts. The hostSelector has similar issues. All of these issues can be resolved by using Runtime Hooks instead of the K8s API server.
Alternatives
Another way to do that is to use bpf_current_task_under_cgroup or bpf_get_current_ancestor_cgroup_id eBPF helpers. This allows us to check if a
cgroup_idis somewhere under a specific cgroup subtree. If this specific cgroup subtree is where k8s adds new pods, then we can achieve the same results. The challenge here is to find thecgroup_idthat k8s use to keep all pods. In most of the cases this is/sys/fs/cgroup/kubepods.slice/but this is not always the case in all deployments. It seems that there are ways to get that inside Tetragon but there may be a bit complicated.For this reason, we are starting with the approach proposed in this PR. It works in all cases, although it introduces a small increase in memory usage. At this point, the impact does not seem significant enough to justify a more complex design. We can always improve it in future PRs if needed.