-
Notifications
You must be signed in to change notification settings - Fork 67
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
.eh_frame: Read user stacks from kernel context #1342
Conversation
9d5cac3
to
4740546
Compare
Context ======= If a userspace process was compiled with frame pointers, we can rely on `bpf_get_stackid` with the `BPF_F_USER_STACK` flag to fetch user stacks. It does all the hard work, including making sure we can walk user stacks from kernel context, such as when we are executing a system call. When this happens, the kernel saves the userspace registers in memory. The registers we get through the perf events are the current registers. As we are in kernel context, these are registers that point to kernel data and code. For this reason, we need a way to recover said stored registers. If we were building a kernel extension we would use the `task_pt_regs` macro. There's a new helper [1] to achieve exactly this, `bpf_task_pt_regs`, but unfortunately we can't always rely on it's included only in v5.15 or greater. Proposed solution ================= We walk the internal kernel datastructures until we can read the x86_64 registers we need. In case we are in kernel context, and current thread is not a kernel process (kworker), we read the saved task's registers and continue the normal unwinding process. Notes ===== For some reason, not really sure about why; needs more investigation, writing this code has not been a walk in the park. The verifier didn't allow me to do direct pointer dereferences, that's why the code is littered with manual ones. Mostly complained with > `avoid R0 invalid mem access 'scalar'` The other interesting bit is that I can't make the verifier pass when checking for the return value of `set_initial_state` and returning if there is an issue. > dereference of modified ctx ptr R6 off=32 disallowed Additonally, the full `vmlinux.h` has been dumped as we need the definition of a bunch of structs. Future changes / areas of improvement ===================================== - Once we have automatic feature calibration we can use the new helper, if available, which will be more reliable; - This code is pretty fragile. Any changes in the running box, such as running with KASAN=on, will make it fail, resulting in missing kernel stacks; Test Plan ========= Ran it for 1h without issues. The kernel stacks now appear and look reasonable. See PR for more. **kernel tests** ``` ============= Test results: ============= - ✅ 5.4 - ✅ 5.10 - ✅ 5.18 - ✅ 5.19 ``` - [0]: https://github.com/torvalds/linux/blob/3d7cb6b04c3f3115719235cc6866b10326de34cd/arch/x86/include/asm/processor.h#L758-L763 - [1]: torvalds/linux@dd6e10f Signed-off-by: Francisco Javier Honduvilla Coto <[email protected]>
d3fce98
to
7a2a3e7
Compare
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
LGTM ❤️
I need to understand in-depth the implications of using bpf_probe_read_kernel
and bpf_get_current_task
for #1131
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Shall we add some sort of documentation to denote which version of the Kernel that vmlinux.h
generated?
Have you ever seen a guideline around which version should be used? Minimum supported version? Maximum supported version?
Never mind. |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
LGTM, great work! 🎉
Not entirely related to this review but it's probably a good time to start thinking about having unit tests for our BPF program. While manual e2e tests are helping us right now, I think it'll be nice to make sure that some of the BPF code is behaving the way we intend it to. Cilium has the unit testing framework for their BPF datapath. Maybe we can have something similar: https://docs.cilium.io/en/latest/contributing/testing/bpf/#bpf-testing? WDYT?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Super clear and easy to understand, nice job! lgtm
Yes, let's discuss this. Your work to have BTF is super relevant now that access kernel structures.
Good question! As far as I know, as long as the architecture is matched, we can use any version we want that has everything we need. There's a lot of details here, so will gather them in the next week and get back to you.
It seems to be mostly the same, but with extra metadata for BTF. I think it got introduced in
I totally agree! This is something we've been discussing over the next couple of weeks. We need tests to check fo the correctness of the stacks. This is something that both Kemal and I are working on. |
Thanks for your comments, folks! Really appreciate them. I'll merge and continue iterating on this, let's discuss the testing and BTF + CO-RE in the next few days 😄 |
Context
If a userspace process was compiled with frame pointers, we can rely on
bpf_get_stackid
with theBPF_F_USER_STACK
flag to fetch user stacks.It does all the hard work, including making sure we can walk user stacks
from kernel context, such as when we are executing a system call.
When this happens, the kernel saves the userspace registers in memory. The
registers we get through the perf events are the current registers. As we
are in kernel context, these are registers that point to kernel data and
code. For this reason, we need a way to recover said stored registers.
If we were building a kernel extension we would use the
task_pt_regs
macro. There's a new helper 1 to achieve exactly this,
bpf_task_pt_regs
,but unfortunately we can't always rely on it's included only in v5.15 or
greater.
Proposed solution
We walk the internal kernel datastructures until we can read the
x86_64 registers we need. In case we are in kernel context, and current
thread is not a kernel process (kworker), we read the saved task's
registers and continue the normal unwinding process.
Notes
For some reason, not really sure about why; needs more investigation, writing this code has not been a walk in the park. The verifier didn't allow me to do direct pointer dereferences, that's why the code is littered with manual ones. Mostly complained with
The other interesting bit is that I can't make the verifier pass when checking for the return value of
set_initial_state
and returning if there is an issue.Additionally, the full
vmlinux.h
has been dumped as we need the definition of a bunch of structs.Future changes / areas of improvement
Test Plan
Ran it for 1h without issues. The kernel stacks now appear and look reasonable.
kernel tests
Signed-off-by: Francisco Javier Honduvilla Coto [email protected]