stack quality: investigate user stack unwinding issues #522
Replies: 9 comments
-
Bisecting As the results did not make sense, I manually bisected this once again but ran Identical configuration across machines:
I couldn't spot any meaningful differences or bugs in our code, so I ran It can't! This is odd! Thinking it could be skid somehow being the problem, I tried with |
Beta Was this translation helpful? Give feedback.
-
Unwinding the stack with DWARF instead of frame pointers ( As @brancz pointed out, there's probably something off with the frame pointers in some of libc's frames that preventing the right stack walking.
Something to check out is what's the difference between hosts then, is it that some libc functions in one of the hosts are inlined or don't have the correct function preamble, or something else? |
Beta Was this translation helpful? Give feedback.
-
Oddly enough, even though we continue following the stack when walking with FPs from Breakpoint 1, 0x00007fc3b8f286b0 in close () from target:/lib/x86_64-linux-gnu/libc.so.6
(gdb) disassemble 0x00007fc3b8f286b0
Dump of assembler code for function close:
=> 0x00007fc3b8f286b0 <+0>: mov %fs:0x18,%eax
0x00007fc3b8f286b8 <+8>: test %eax,%eax
0x00007fc3b8f286ba <+10>: jne 0x7fc3b8f286d0 <close+32>
0x00007fc3b8f286bc <+12>: mov $0x3,%eax
0x00007fc3b8f286c1 <+17>: syscall How does it work? What am I missing? 🤔 |
Beta Was this translation helpful? Give feedback.
-
When using dwarf to unwind it doesn't use frame pointers at all, it uses unwind tables ( |
Beta Was this translation helpful? Give feedback.
-
@brancz sorry if my last comment was confusing. The thing that I find odd is how The |
Beta Was this translation helpful? Give feedback.
-
TL;DR: The issue, as we suspected, lies in the lack of frame pointers in libc, which results in broken stack traces. The reason why ContextThere are a couple of things that @Sylfrena, @kakkoyun and I wanted to understand:
Frame pointer setupAll the functions in our program do set up the frame pointer correctly: (gdb) disassemble main
Dump of assembler code for function main():
0x0000000000401300 <+0>: push %rbp
0x0000000000401301 <+1>: mov %rsp,%rbp
[...]
(gdb) disassemble a1
Dump of assembler code for function _Z2a1v:
0x0000000000401280 <+0>: push %rbp
0x0000000000401281 <+1>: mov %rsp,%rbp
[...]
(gdb) disassemble a2
Dump of assembler code for function _Z2a2v:
0x00000000004012b0 <+0>: push %rbp
0x00000000004012b1 <+1>: mov %rsp,%rbp
[...]
(gdb) disassemble a2
Dump of assembler code for function _Z2a2v:
0x00000000004012b0 <+0>: push %rbp
0x00000000004012b1 <+1>: mov %rsp,%rbp
[...]
(gdb) disassemble top
Dump of assembler code for function _Z3topv:
0x0000000000401220 <+0>: push %rbp
0x0000000000401221 <+1>: mov %rsp,%rbp
[...] However, (gdb) disassemble __close
Dump of assembler code for function close:
0x00007fc2b9c356b0 <+0>: mov %fs:0x18,%eax
0x00007fc2b9c356b8 <+8>: test %eax,%eax
0x00007fc2b9c356ba <+10>: jne 0x7fc2b9c356d0 <close+32>
0x00007fc2b9c356bc <+12>: mov $0x3,%eax
0x00007fc2b9c356c1 <+17>: syscall
[...]
(gdb) disassemble open64
Dump of assembler code for function open64:
0x00007fc2b9c34b90 <+0>: push %r12
0x00007fc2b9c34b92 <+2>: mov %esi,%r10d
0x00007fc2b9c34b95 <+5>: mov %esi,%r12d
0x00007fc2b9c34b98 <+8>: push %rbp
0x00007fc2b9c34b99 <+9>: mov %rdi,%rbp
0x00007fc2b9c34b9c <+12>: sub $0x68,%rsp
[...] So how come Unwinding the stack by hand (using frame pointers)Let's see the return address of the caller of
It makes sense that we see If we execute a few more instructions, then we see
So far, so good, as we expected, everything works fine in the functions from our binary! Let's take a look at the Libc functions(gdb) b __close
Breakpoint 1 at 0x7f90433816b0
(gdb) c
Continuing.
Breakpoint 1, 0x00007f90433816b0 in close () from target:/lib/x86_64-linux-gnu/libc.so.6
(gdb) info symbol *(uint64_t*)($rbp + sizeof(uint64_t))
c1() + 9 in section .text of target:/app/parca-demo Here we are seeing So back to the original question, why did Why was
|
Beta Was this translation helpful? Give feedback.
-
As a remediation, until DWARF unwinding is fleshed out and released if you bump into this issue and run in Ubuntu, installing
|
Beta Was this translation helpful? Give feedback.
-
Update, this has been fixed and has been stable for several months (#1055), closing! |
Beta Was this translation helpful? Give feedback.
-
(Opening in
parca-agent
but this issue might be onparca
)I noticed that some Kernel stacks aren't correct with the current master of both
parca
andparca-agent
. The test program and correct result that we used to get can be found in this PR (#395).Note the Kernel stack for the
open()
codepath which is not in the right merged stacktrace:I am currently bisecting both projects, will report back once I have more information
Beta Was this translation helpful? Give feedback.
All reactions