Skip to content

Add support for spec.hostSelector#4814

Open
tpapagian wants to merge 7 commits intomainfrom
pr/apapag/hostSelector
Open

Add support for spec.hostSelector#4814
tpapagian wants to merge 7 commits intomainfrom
pr/apapag/hostSelector

Conversation

@tpapagian
Copy link
Copy Markdown
Member

@tpapagian tpapagian commented Mar 31, 2026

This PR adds support for spec.hostSelector to enable users to write policies that are applied only to host workloads and/or pod workloads (with additional filtering for the containers if needed).

The syntax is:

spec:
  hostSelector: {}
  podSelector: {}
  containerSelector: {}

and the previous example will match all workloads either in the host or inside pods without any filtering. This is also the default (i.e. if the user does not define any of those in a tracing policy).

If we need to match only host workloads we need to have something like:

spec:
  hostSelector: {}
  podSelector: null
  containerSelector: null

For now we only support 2 types of hostSelector. These are the {} (everything) and null (nothing). Specific filters are not supported yet.

In order to make that work, we need to make some changes in the behaviour of the podSelector. Currently, podSelector: {} and podSelector: null are exactly the same thing and will match all workloads (including host workloads). Based on this behaviour, now we don't have a way to match on all pod workloads (excluding host workloads).

In this PR we change that behaviour. Now podSelector: {} will match all pod workloads and podSelector: null will not match any pod workloads. The behaviour of containerSelector is similar, but what actually does is a second level filtering on the podSelector. For this reason:

spec:
  hostSelector: ~
  podSelector: ~
  containerSelector:
    matchExpressions:
      - key: name
        operator: In
        values:
        - ubuntu

will not match anything as podSelector is null. We need to have something like

spec:
  hostSelector: ~
  podSelector: {}
  containerSelector:
    matchExpressions:
      - key: name
        operator: In
        values:
        - ubuntu

to filter containers.

This PR also provides the ability to make hostSelector, podSelector, and containerSelector equals to null which was not possible before.

Implementation

In order to make that work, this PR adds one more entry in policy_filter_maps with key equals to ALL_PODS_POLICY_ID (UINT32_MAX). The inner map of that entry contains all cgroup_ids of all pods in the system. If a cgroup_id is part of that set, this means that it is a pod workload. If not this means that this is part of the host workload. Additionally, each policy contains an invalid cgroup_id entry with value HOST_SELECTOR_MODE (UINT64_MAX) if the policy itself cares about host workloads.

Then we update policy_filter_check to check those as well. More specifically what policy_filter_check does is:

  1. check if our cgroup_id match on the policy build from podSelector/containerSelector
    1. return true if we match
    2. continue to check if we care about host workloads
  2. check if the policy cares also for host workloads
    1. return false if not
    2. continue to check if our cgroup_id is a host workload
  3. check if our cgroup_id is part of ALL_PODS_POLICY_ID
    1. return true if not (this means that out cgroup_id is not a pod workload --> so this is a host workload)
    2. return false if it is (this means that out cgroup_id is a pod workload)

Any updates needed for the policy_filter_maps are handled from the agent in a similar way to podSelector/containerSelector.

Testing

All tests related to the policyfiler have been updated to reflect those changes. Some additional tests have been added to check the hostSelector explicitly.

Limitations

This approach has similar issues with Kubernetes Identity Aware Policies and more specifically if we rely on the k8s API server, there may be a race on when we add a cgroup_id to our maps and when the pod actually starts. The hostSelector has similar issues. All of these issues can be resolved by using Runtime Hooks instead of the K8s API server.

Alternatives

Another way to do that is to use bpf_current_task_under_cgroup or bpf_get_current_ancestor_cgroup_id eBPF helpers. This allows us to check if a cgroup_id is somewhere under a specific cgroup subtree. If this specific cgroup subtree is where k8s adds new pods, then we can achieve the same results. The challenge here is to find the cgroup_id that k8s use to keep all pods. In most of the cases this is /sys/fs/cgroup/kubepods.slice/ but this is not always the case in all deployments. It seems that there are ways to get that inside Tetragon but there may be a bit complicated.

For this reason, we are starting with the approach proposed in this PR. It works in all cases, although it introduces a small increase in memory usage. At this point, the impact does not seem significant enough to justify a more complex design. We can always improve it in future PRs if needed.

@tpapagian tpapagian force-pushed the pr/apapag/hostSelector branch 4 times, most recently from ced9131 to 2be16a3 Compare April 1, 2026 10:33
@tpapagian tpapagian added the release-note/misc This PR makes changes that have no direct user impact. label Apr 1, 2026
@tpapagian tpapagian force-pushed the pr/apapag/hostSelector branch 2 times, most recently from 17abc04 to 33be313 Compare April 1, 2026 16:51
@netlify
Copy link
Copy Markdown

netlify bot commented Apr 1, 2026

Deploy Preview for tetragon ready!

Name Link
🔨 Latest commit c6afde5
🔍 Latest deploy log https://app.netlify.com/projects/tetragon/deploys/69d50701aab13e0008a20fbc
😎 Deploy Preview https://deploy-preview-4814--tetragon.netlify.app
📱 Preview on mobile
Toggle QR Code...

QR Code

Use your smartphone camera to open QR code link.

To edit notification comments on pull requests, go to your Netlify project configuration.

@tpapagian tpapagian force-pushed the pr/apapag/hostSelector branch 8 times, most recently from 19ffe02 to 575ddd6 Compare April 6, 2026 16:21
@tpapagian tpapagian changed the title Test hostSelector Add support for spec.hostSelector Apr 7, 2026
@tpapagian tpapagian force-pushed the pr/apapag/hostSelector branch from 575ddd6 to 40b559c Compare April 7, 2026 11:53
This patch the policy_filter_check function to check for host workloads
as well.

In order to make that work, this patch assumes that one more entry in
policy_filter_maps with key equals to ALL_PODS_POLICY_ID (UINT32_MAX) exists.
The inner map of that entry should contain all cgroup_ids of all pods in
the system. If a cgroup_id is part of that set, this means that it is a pod
workload. If not this means that this is part of the host workload.
Additionally, it assumes that a policy contains an invalid cgroup_id
entry with value HOST_SELECTOR_MODE (UINT64_MAX) if the policy itself
cares about host workloads.

The way that policy_filter_check works is the following:
1. check if our cgroup_id match on the policy build from podSelector/containerSelector (similar to before)
1.1. return true if we match
1.2. continue to check if we care about host workloads
2. check if the policy cares also for host workloads
2.1. return false if not
2.2. continue to check if our cgroup_id is a host workload
3. check if our cgroup_id is part of ALL_PODS_POLICY_ID
3.1. return true if not (this means that out cgroup_id is not a pod workload --> so this is a host workload)
3.2. return false if it is (this means that out cgroup_id is a pod workload)

Next patches will add support for updating policy_filter_maps as
previously described.

Signed-off-by: Anastasios Papagiannis <anastasios.papagiannis@isovalent.com>
This patch adds hostSelector under tracing policy spec.

Expcept from that it also does two additional changes:
1. Allow the (i.e. podSelector, containerSelector, hostSelector) to be
   null by adding the +nullable flag.
2. Make the default value to be {} instead of null.

By default a policy is similar to:
spec:
  hostSelector: {}
  podSelector: {}
  containerSelector: {}

which means that it matches everything.

Then the user can add addiitonal filtering to those to remove uncessary
workloads from a policy.

Signed-off-by: Anastasios Papagiannis <anastasios.papagiannis@isovalent.com>
This patch updates policy_filter_maps to have:
1. One more entry with key equals to ALL_PODS_POLICY_ID (UINT32_MAX).
   The inner map of that entry contains all cgroup_ids of all pods in
   the system. If a cgroup_id is part of that set, this means that it
   is a pod workload. If not this means that this is part of the host
   workload.
2. If a policy cares about host workloads it contains an invalid cgroup_id
   entry with value HOST_SELECTOR_MODE (UINT64_MAX).

A previous patch handles those inside eBPF to provide the desired
behaviour.

This approach adds one more entry that also increases the memory
consumption by a small factor. This is not a big issue as if a single
policy wants to monitor all pods, it will need the same amount of
memory. For the podSelector this is needed only once.

Signed-off-by: Anastasios Papagiannis <anastasios.papagiannis@isovalent.com>
The previous patch changes the contents of policy_filter_maps. This
patch updates the tests to reflect those changes.

Signed-off-by: Anastasios Papagiannis <anastasios.papagiannis@isovalent.com>
This patch adds tests for tracing policy with hostSelector.

Signed-off-by: Anastasios Papagiannis <anastasios.papagiannis@isovalent.com>
In the hostSelector we have a race if we rely on the k8s API to update
our maps. This is similar to the race that we have in podSelector as
well. The fix for that issue is to use the Runtime Hooks.

This race cause some flakes in the policyfilter e2e test. As the purpose
of this test is not to test the hostSelector, we exclude host workloads
to remove that flake.

Signed-off-by: Anastasios Papagiannis <anastasios.papagiannis@isovalent.com>
Signed-off-by: Anastasios Papagiannis <anastasios.papagiannis@isovalent.com>
@tpapagian tpapagian force-pushed the pr/apapag/hostSelector branch from 40b559c to c6afde5 Compare April 7, 2026 13:30
@tpapagian tpapagian added release-note/minor This PR introduces a minor user-visible change and removed release-note/misc This PR makes changes that have no direct user impact. labels Apr 7, 2026
@tpapagian tpapagian marked this pull request as ready for review April 7, 2026 14:23
@tpapagian tpapagian requested a review from a team as a code owner April 7, 2026 14:23
@tpapagian tpapagian requested a review from mtardy as a code owner April 7, 2026 14:23
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

release-note/minor This PR introduces a minor user-visible change

Projects

None yet

Development

Successfully merging this pull request may close these issues.

1 participant