BPF token support and testing infrastructure by dylandreimerink · Pull Request #1953 · cilium/ebpf

dylandreimerink · 2026-02-17T15:10:34Z

This PR introduces test utilities that allows us to write tests for BPF tokens. This RunWithToken utility function is largely based on the test code from the kernel selftests but modified to work with Go and its runtime.

The individual commit messages contain more details. The high level is that we can write tests (and subtests) where we start by calling RunWithToken with a set of delegated permissions. The rest of the test body will be executed in an environment with a BPFFS mounted with those permissions.

This PR also includes BPF token support. This takes the "ambient token" approach, where we do not add any new API surface, but we automatically use tokens when they are available. This allows us to experiment with tokens without committing to an API. We can always add more controls in the future.

Instead of relying on hardcoding search paths or environment variables, this implementation scans the /proc/self/mountinfo procfile to find all BPFFS instances mounted in the processes mount namespace. This means we are able to use any BPFFS path, and even multiple BPFFS'es, for example if one FS has no permissions but holds pinned objects, and a secondary BPFFS is used to the delegated tokens. However, there is no method to select a preferred one, if multiple BPFFS'es with delegated privileges exist (which should be uncommon).

A nice side effect of reading the mount info is that we also know which permissions are delegated which is used in this implementation to enhance error messages when using tokens but the token lacks certain permissions.

The main purpose of this PR was to implement the testing utilities, and the PR is structured such that we can drop the actual token implementation if any of the other proposals is more appealing then this one.

Note: This PR currently has cherry-picked a commit from #1950 for its capability syscalls. This PR should be rebased once #1950 is merged.

javiercardona-work · 2026-02-17T22:18:30Z

Thanks for doing this. If you end up landing this, I will withdraw #1948.

I tested this PR in our environment, and it works as documented. In case it can be useful, I'm attaching the test program we used: ambient_token_test.go.zip

This runs in our namespaced container with CAP_BPF permissions and a delegated BPF mountpoint (/run/tw/bpf). The mountpoint is automatically detected:

$ /tmp/ambient-token-test            
ambient-token-test v7

=== Diagnostics ===
  user namespace: user:[4026535428]
  unprivileged_bpf_disabled: 1

  Capabilities before:
  CapInh:       0000000000000000
  CapPrm:       000001c7a9ac7dff
  CapEff:       000001c7a9ac7dff
  CapBnd:       000001c7a9ac7dff
  CapAmb:       0000000000000000

=== Token Create ===
  token.Create(): OK (fd=6, mount=/run/tw/bpf)

=== Feature Probes (ambient token) ===
  prog SocketFilter         supported
  prog Kprobe               supported
  prog SchedCLS             supported
  prog XDP                  supported
  prog TracePoint           supported
  prog CGroupSKB            supported
  map  Hash                 supported
  map  Array                supported
  map  PerfEventArray       supported
  map  LRUHash              supported
  map  RingBuf              supported

=== Map Create (ambient token) ===
  map create: OK (fd=3)

=== Program Load (ambient token) ===
  program load: OK (fd=3)

Done.

When running it without CAP_BPF, token creation and all subsequent bpf operations fail, as expected:

$ /tmp/ambient-token-test --drop-caps
ambient-token-test v7

=== Diagnostics ===
  user namespace: user:[4026535428]
  unprivileged_bpf_disabled: 1

  Capabilities before:
  CapInh:       0000000000000000
  CapPrm:       000001c7a9ac7dff
  CapEff:       000001c7a9ac7dff
  CapBnd:       000001c7a9ac7dff
  CapAmb:       0000000000000000

  Dropping CAP_BPF, CAP_SYS_ADMIN, CAP_PERFMON...
  Capabilities after:
  CapInh:       0000000000000000
  CapPrm:       00000107a98c7dff
  CapEff:       00000107a98c7dff
  CapBnd:       00000107a98c7dff
  CapAmb:       0000000000000000

=== Token Create ===
  token.Create(): BPF Token is not available

=== Feature Probes (ambient token) ===
  prog SocketFilter         detect support for SocketFilter: load program: operation not permitted
  prog Kprobe               detect support for Kprobe: load program: operation not permitted
  prog SchedCLS             detect support for SchedCLS: load program: operation not permitted
  prog XDP                  detect support for XDP: load program: operation not permitted
  prog TracePoint           detect support for TracePoint: load program: operation not permitted
  prog CGroupSKB            detect support for CGroupSKB: load program: operation not permitted
  map  Hash                 detect support for Hash: operation not permitted
  map  Array                detect support for Array: operation not permitted
  map  PerfEventArray       detect support for PerfEventArray: operation not permitted
  map  LRUHash              detect support for LRUHash: operation not permitted
  map  RingBuf              detect support for RingBuf: operation not permitted

=== Map Create (ambient token) ===
  map create: FAILED (creating map: map create: operation not permitted (MEMLOCK may be too low, consider rlimit.RemoveMemlock))

=== Program Load (ambient token) ===
  program load: FAILED (load program: operation not permitted (MEMLOCK may be too low, consider rlimit.RemoveMemlock))

Done.

anryko

Thank you for working on this!

internal/testutils/token_linux.go

map.go

prog.go

internal/testutils/token_linux.go

anryko · 2026-02-19T14:48:41Z

internal/token/token.go

+// Create attempts to create a BPF token, it does so by finding all mounted BPFFS filesystems, and trying to create a
+// token on each of them until it finds one that works.
+//
+// If a token cannot be created, it returns [ErrTokenNotAvailable]. If the OS is not Linux, it returns
+// [internal.ErrNotSupportedOnOS].


If we have a hypothetical situation with two bpffs mounts. One with delegation and one without. And assume that we have all the necessary CAPs. This approach will always succeed and return a token. Then, because of how the rest of the code is written, we always rely on that token (instead of fallback to it on perm issue). So we scope by least privileged access method in this case instead of the most privileged. I did not actually test this... but it seems to me that this is how this works.

Yes, effectively. Token creation would fail for the BPFFS without delegated permissions, and succeed for the one with delegated permissions. So we would have a token, and use it. When a token is given, the kernel will ignore any capabilities the process may have, at least as far as I understand it.

So yes, with this logic, we would not fallback to using the native capabilities when a token is present. But then again, having CAPs and tokens is fundamentally a broken setup, at least from a security perspective.

Got it. If we are able to acquire the token, we always assume that this the intended use. In that case I would suggest to always check the default path first (/sys/fs/bpf) and only after fallback to parsing /proc/self/mountinfo. Also it would make sense to cache the discovered token.

In that case I would suggest to always check the default path first (/sys/fs/bpf) and only after fallback to parsing /proc/self/mountinfo.

I am not sure if we need to be solving for having 2 bpffs'es with delegated permissions just yet.

Also it would make sense to cache the discovered token.

Yes. I have been thinking about this. In the current solution, we do 2 syscalls to get a token every time we load a program / map, so that could add up over time.

We could move the token creation to the collection loader. But we have a few places such as feature probing where no collections are involved.

We could do some sort of straight up cache with a global variable, but it means we hold onto a token file descriptor forever. We usually are very strict about that sort of thing in the library. I had been thinking that perhaps we can use a weak pointer, combined with the normal sys.FD cleanup logic. That way we hold onto tokens in-between GC cycles, but not forever.

In the end I decided to leave that for another day, perhaps I am overthinking it.

It should be safe to assume that if /proc/self/mountinfo file was not changed, the token is valid; and reparse and recreate the token otherwise.

Setting up the environment to be able to use BPF tokens is a bit involved. This commit adds the `RunWithToken` utility function to do this setup. Using tokens requires a few things: * The process must run in a user namespace that is not the initial user namespace of the system. * The process must have the CAP_BPF capability in that user namespace. * A BPFFS must be available that has been mounted with delegated permissions. * The BPFFS must be owned by the user namespace of the process. In addition we want to run some tests with tokens and other without. Some actions involved in setting up the environment for tokens are irreversible, so we have to spawn a subprocess to run the tests with tokens. The `RunWithToken` helper does all of the required setup. It creates a UNIX socket pair to communicate with the subprocess, then executes a new instance of the current test binary. The new process is executed in its own user NS and mount NS. Only the current test is executed in the subprocess. An environment variable is used to detect if we are executing in a subprocess or not. The parent will wait for a BPFFS context to be send over the socket pair and will delegate the permissions specified as arguments to `RunWithToken`. When the subprocess first starts, it check the env var, and does the child side of the setup. It creates a BPFFS context, and sends it to the parent via the UNIX socket. After getting confirmation that the delegation is done, the child mounts the BPFFS, drops capabilities, and does a execve to re-execute itself. This ensures that in the final stage tokens are setup, even during init and global variable initialization. Signed-off-by: Dylan Reimerink <dylan.reimerink@isovalent.com>

This commit makes three modifications to the type generation. The first is that types and funcs for the token create command are generated. Second is that we generate a ...FromString method for enums, which allows us to parse an enum from string. And third is that we generate if any enum has a value with the __MAX prefix, we will remove the underscores so this max value can be used to iterate over all valid enum values. Signed-off-by: Dylan Reimerink <dylan.reimerink@isovalent.com>

This commit adds logic to create BPF tokens. We parse the mountinfo proc file to discover all bpffs mounts. We then attempt to create a token from each mount until we find one that works. This means that we do not have to hard code any paths or pass them via environment variables. A side effect is that parsing the mountinfo tells us exactly what the delegated permissions are of the token. This allows us to create more informative error messages or to skip trying to execute syscalls we know will fail. Signed-off-by: Dylan Reimerink <dylan.reimerink@isovalent.com>

This commit adds token support for programs. It uses the "ambient" token approach, which means that the token is not explicitly passed by a user but instead we always attempt to create a token when loading programs and fall back to loading without a token if token creation fails. This commit also adds rich errors for token related failure. If loading a program fails with EPERM, and without a verifier log we check if the token is missing a BPF command, program type, or attach type which would explain the failure. If we find an issue, we enrich the error to assist users in debugging token related issues. Signed-off-by: Dylan Reimerink <dylan.reimerink@isovalent.com>

This commit adds token support for maps. It uses the "ambient" token approach, which means that the token is not explicitly passed by a user but instead we always attempt to create a token when loading maps and fall back to loading without a token if token creation fails. This commit also adds rich errors for token related failure. If loading a map fails with EPERM we check if the token is missing a BPF command or map type which would explain the failure. If we find an issue, we enrich the error to assist users in debugging token related issues. Signed-off-by: Dylan Reimerink <dylan.reimerink@isovalent.com>

This commit adds token support for BTF. It uses the "ambient" token approach, which means that the token is not explicitly passed by a user but instead we always attempt to create a token when loading BTF and fall back to loading without a token if token creation fails. This commit also adds rich errors for token related failure. If loading a BTF fails with EPERM we check if the token is missing the BPF command permission which would explain the failure. If we find an issue, we enrich the error to assist users in debugging token related issues. Signed-off-by: Dylan Reimerink <dylan.reimerink@isovalent.com>

dylandreimerink · 2026-02-23T14:30:12Z

Rebased on main and updated to match changes made in #1950. Marking ready for review.

javiercardona-work

The updated version of this branch still passes our tests in our containerized environment.

[09:47:25 root@tsp_prn/mks/mks-system-pool-jcardona-sparklemuffin-kubelet/0 ~]$ /tmp/ambient-token-test 
ambient-token-test v9

=== Diagnostics ===
  user namespace: user:[4026535252]
  unprivileged_bpf_disabled: 1

  Capabilities before:
  CapInh:       0000000000000000
  CapPrm:       000001c7a9ac7dff
  CapEff:       000001c7a9ac7dff
  CapBnd:       000001c7a9ac7dff
  CapAmb:       0000000000000000

=== Token Create ===
  token.Create(): OK (fd=6, mount=/run/tw/bpf)

=== Feature Probes (ambient token) ===
  prog SocketFilter         supported
  prog Kprobe               supported
  prog SchedCLS             supported
  prog XDP                  supported
  prog TracePoint           supported
  prog CGroupSKB            supported
  map  Hash                 supported
  map  Array                supported
  map  PerfEventArray       supported
  map  LRUHash              supported
  map  RingBuf              supported

=== Map Create (ambient token) ===
  map create: OK (fd=3)

=== Program Load (ambient token) ===
  program load: OK (fd=3)

Done.

javiercardona-work · 2026-02-23T17:14:34Z

internal/testutils/token_linux.go

+	}
+
+	for line := range strings.Lines(string(passwdFile)) {
+		cols := strings.Split(line, ":")


malformed passwd file might have lines with len(cols) < 4. Maybe check?

Suggested change

cols := strings.Split(line, ":")

cols := strings.Split(line, ":")

if len(cols) < 4 {

continue

}

The man page man 5 passwd states:

/etc/passwd contains one line for each user account, with seven fields delimited by colons (“:”).

So we can expect up to 7 fields. We only need access to UID and GID, which are the third and forth fields (index 2 and 3). So I don't think we can change this check. If we don't have at least 4 fields, we can't handle it.

I was referring to "malformed" files (i.e. due to human error, disk corruption, or malicious activity). But I suppose if that happens, we have other things to worry about.

javiercardona-work · 2026-02-23T17:37:39Z

internal/testutils/token_linux.go

+		gid, err := strconv.Atoi(cols[3])
+		if err != nil {
+			return nil, nil, fmt.Errorf("parse gid: %v", err)
+		}
+		gids = append(gids, syscall.SysProcIDMap{
+			ContainerID: gid,
+			HostID:      gid,
+			Size:        1,
+		})
+	}


As multiple users may have the same GID, the gid mapping constructed here might have multiple entries for the same GID. I believe the kernel will consider these overlapping entries and reject the mounts. Maybe dedup as...

Suggested change

gid, err := strconv.Atoi(cols[3])

if err != nil {

return nil, nil, fmt.Errorf("parse gid: %v", err)

}

gids = append(gids, syscall.SysProcIDMap{

ContainerID: gid,

HostID: gid,

Size: 1,

})

}

gid, err := strconv.Atoi(cols[3])

seen := map[int]bool{}

if err != nil {

return nil, nil, fmt.Errorf("parse gid: %v", err)

}

if !seen[gid] {

seen[gid] = true

gids = append(gids, syscall.SysProcIDMap{

ContainerID: gid,

HostID: gid,

Size: 1,

})

}

javiercardona-work · 2026-03-02T17:43:44Z

@rgo3 I there something I can do to help merge this PR? Happy to run additional tests on my environment.

dylandreimerink · 2026-03-03T13:47:49Z

I there something I can do to help merge this PR?

We are waiting for @ti-mo to review, since I made the PR, and he is the only other project maintainer. But Timo is a busy man, with a very long todo list, so this just takes some time.

All reviews are welcome, but Timo's approval determines merge-ability of the PR.

dylandreimerink force-pushed the feature/tokens-with-tests branch 2 times, most recently from e80d8a8 to fa9614c Compare February 17, 2026 15:43

dylandreimerink force-pushed the feature/tokens-with-tests branch 2 times, most recently from 8f7f344 to c2149ea Compare February 18, 2026 12:51

anryko reviewed Feb 19, 2026

View reviewed changes

anryko mentioned this pull request Feb 19, 2026

Adding support for privilege delegation using BPF Token #1921

Open

javiercardona-work mentioned this pull request Feb 19, 2026

Add BPF token support for program and map loading #1948

Closed

dylandreimerink force-pushed the feature/tokens-with-tests branch 2 times, most recently from 50c6abc to 0209971 Compare February 20, 2026 12:19

ti-mo changed the title ~~Add BPF token testing infa and BPF token support~~ BPF token support and testing infrastructure Feb 23, 2026

dylandreimerink force-pushed the feature/tokens-with-tests branch from 0209971 to d29e5d5 Compare February 23, 2026 14:13

dylandreimerink added 6 commits February 23, 2026 15:17

dylandreimerink force-pushed the feature/tokens-with-tests branch from d29e5d5 to a416a64 Compare February 23, 2026 14:22

dylandreimerink marked this pull request as ready for review February 23, 2026 14:30

dylandreimerink requested review from a team and rgo3 as code owners February 23, 2026 14:30

javiercardona-work approved these changes Feb 23, 2026

View reviewed changes

Conversation

dylandreimerink commented Feb 17, 2026

Uh oh!

javiercardona-work commented Feb 17, 2026

Uh oh!

anryko left a comment

Choose a reason for hiding this comment

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

dylandreimerink commented Feb 23, 2026 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Uh oh!

javiercardona-work left a comment

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

javiercardona-work Feb 23, 2026 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Choose a reason for hiding this comment

Uh oh!

javiercardona-work commented Mar 2, 2026

Uh oh!

dylandreimerink commented Mar 3, 2026

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

3 participants

dylandreimerink commented Feb 23, 2026 •

edited

Loading

javiercardona-work Feb 23, 2026 •

edited

Loading