-
Notifications
You must be signed in to change notification settings - Fork 563
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
rr hangs waiting on a process to exit for a process that has already exited when running Firefox #3727
Comments
What does /proc/1776030/status (or the equivalent for your next run) show? What's /proc/1776030/stack? |
In my current reproduction, 1810308 is reported as sad.
pstree -lpt for the ./mach run invocation:
|
Sorry, I should have been more clear. In your Scheduler:debug log it shows rr is spinning its wheels on a particular process. What are /proc/pid/status and /proc/pid/stack for that process? |
I should have been more clear too ;) The process is gone. There is no such process in the system anymore. This is the current log that it's spinning on despite that:
|
Oh, the process has actually exited? That's exciting. |
It turns out using The following YOLO patch seems to work sufficiently well for me and did not immediately set everything on fire[1]: diff --git a/src/Scheduler.cc b/src/Scheduler.cc
index 42c9f42c..15498337 100644
--- a/src/Scheduler.cc
+++ b/src/Scheduler.cc
@@ -331,7 +331,7 @@ bool Scheduler::is_task_runnable(RecordTask* t, WaitAggregator& wait_aggregator,
}
}
- if (t->waiting_for_ptrace_exit) {
+ if (t->waiting_for_ptrace_exit && !t->was_reaped()) {
LOGM(debug) << " " << t->tid << " is waiting to exit; checking status ...";
} else if (t->is_stopped() || t->was_reaped()) {
LOGM(debug) << " " << t->tid << " was already stopped with status " << t->status(); 1: |
Thank you! |
I've recently been experiencing a consistent problem where rr appears to hang when tracing Firefox through normal
mach mochitest
andmach run
workflows on both a "AMD Ryzen Threadripper PRO 5975WX 32-Cores" with the zen workaround script having run where all rr tests passed (after a one-off failure of "#2040: x86/ptrace-32" when run underctest -12
where it seemed like the Ubuntu crash reporter may have inserted itself into things, but which passed on re-run) as well as my previous "Intel(R) Core(TM) i9-7940X CPU @ 3.10GHz" machine. Attempting to ask rr to terminate nicely via a SIGTERM resulted in a broken trace (rr replay -a
was sad), as did the more extreme SIGKILL.The process tree inevitably ends up looking like:
The problem disappeared when, under @khuey's direction I ran with
RR_LOG=all:debug
. I got 3 clean runs (rr terminated normally andrr replay -a
could be used) consecutively in this case, so I switched to performing a debug log ofScheduler:all
where I got a clean run but then started getting bad runs.Under this mode, a steady state log spam of the following happens:
I then captured a log where the sad process seems to be "1776030". The transition to the broken state looked like:
And then looking back to the log messages related to 1776030 that seemed to lead to the state are (chronologically older to newer):
(that gets repeated a bunch), then
then it seems like it's forever
@khuey suggested I file an issue and that @rocallahan might have the most expertise for a problem like this.
The text was updated successfully, but these errors were encountered: