You signed in with another tab or window. Reload to refresh your session.You signed out in another tab or window. Reload to refresh your session.You switched accounts on another tab or window. Reload to refresh your session.Dismiss alert
> uname -a
Linux terrestrial-gamma-ray-flash 6.13.5-zen1-1-zen #1 ZEN SMP PREEMPT_DYNAMIC Thu, 27 Feb 2025 18:09:27 +0000 x86_64 GNU/Linux
Describe your bug.
WalkParallel::visit() will nondeterministically hang if the ParallelVisitor::visit() implementation panics. This also occurs when providing a closure to WalkParallel::run().
// Our stack isn't blocking. Instead of burning the
// CPU waiting, we let the thread sleep for a bit. In
// general, this tends to only occur once the search is
// approaching termination.
let dur = std::time::Duration::from_millis(1);
std::thread::sleep(dur);
}
it will print this out endlessly.
This especially seems to occur more often if the panic handler is non-trivial: I first noticed this when using color-eyre, which prints the panic to the console after composing a backtrace.
What is the expected behavior?
The WalkParallel::visit() method should not hang, and should instead propagate the panic.
My first attempt to solve this (see cosmicexplorer@246c162) set the quit_now flag if any worker panics, and checked self.is_quit_now() within the sub-loop. This seems to have made it repro less often, but I am still able to get it to repro regardless, even without modifying the panic handler.
There are some other changes I want to propose to the ParallelVisitor{,Builder} traits as well which may make this easier (such as propagating unwind safety), but I'm currently hoping to solve this without making any changes to the external interface. This may require changing the parallelism primitives we use, but maybe not.
The text was updated successfully, but these errors were encountered:
Please tick this box to confirm you have reviewed the above.
What version of ripgrep are you using?
master (de4baa1)
How did you install ripgrep?
I am able to reproduce this on
master
.What operating system are you using ripgrep on?
Arch Linux (64-bit, x86):
Describe your bug.
WalkParallel::visit()
will nondeterministically hang if theParallelVisitor::visit()
implementation panics. This also occurs when providing a closure toWalkParallel::run()
.What are the steps to reproduce the behavior?
I am working on a minimal fix to this right now (see https://github.com/BurntSushi/ripgrep/compare/master...cosmicexplorer:ripgrep:parallel-panic-quit?expand=1), but I think it will require a change to the parallelization methods we use from crossbeam, so I wanted to create this issue beforehand. I am currently able to reproduce this with the following:
Apply the above diff, then run the following command:
I typically have to run this many many times on my very fast laptop to get it to trigger. More reliably, you can run:
And running all the tests at once makes this trigger much more reliably.
What is the actual behavior?
The process hangs, typically somewhere in the
Worker::get_work()
method. If I add a print statement within this sub-loop
:ripgrep/crates/ignore/src/walk.rs
Lines 1695 to 1707 in de4baa1
This especially seems to occur more often if the panic handler is non-trivial: I first noticed this when using
color-eyre
, which prints the panic to the console after composing a backtrace.What is the expected behavior?
The
WalkParallel::visit()
method should not hang, and should instead propagate the panic.My first attempt to solve this (see cosmicexplorer@246c162) set the
quit_now
flag if any worker panics, and checkedself.is_quit_now()
within the sub-loop
. This seems to have made it repro less often, but I am still able to get it to repro regardless, even without modifying the panic handler.There are some other changes I want to propose to the
ParallelVisitor{,Builder}
traits as well which may make this easier (such as propagating unwind safety), but I'm currently hoping to solve this without making any changes to the external interface. This may require changing the parallelism primitives we use, but maybe not.The text was updated successfully, but these errors were encountered: