-
Notifications
You must be signed in to change notification settings - Fork 282
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Segmentation fault closing application with multiple isolates and displaying an image #634
Comments
|
A minimal example is an app which displays an image, starts a background isolate, and calls SystemNavigator.pop. See: https://github.com/jslater89/go-flutter-desktop-repro It seems like it must be in the drawing code somewhere, but I haven't quite tracked down what happens on close yet. |
Following some gdb-fu today, the crash happens because a GrGLGpu is used after free in the multiple-isolate case: its destructor runs, then its deleteSync method is called. I'm not sure why, yet, but the message loop stuff in the backtrace seems plausible? One message loop per isolate, wires crossed somehow so that both loops end up with an event that leads to deleting a GrGLGpu shared between threads? I may not have time to dig in any more this weekend, but I'll continue nosing around as I have time. |
I had a little more time to look at things today, and found confirmation that it's related to the message loop/task runner setup: with multiple isolates and an image drawn, EventLoop.PostTask gets called from a non-main thread. Out of time for now, but I have two avenues of attack now: tracing the non-main call to the event loop backward, and working forward from the two-isolates-no-images case. |
Sorry for the late response, I'm out also out of time^^ Line 207 in a180924
| eventLoop := newEventLoop(
glfw.PostEmptyEvent, // Wakeup GLFW
func(t *embedder.FlutterTask) error {
if !a.window.ShouldClose(){
return a.engine.RunTask(t)
}
return nil
}, // Flush tasks
) But it didn't work out! Task thread assignment is handled by the flutter engine framework with: Lines 52 to 56 in a180924
Maybe this requirement isn't meet: Lines 58 to 60 in a180924
|
The errors occurs when we call: go-flutter/embedder/embedder.go Lines 164 to 167 in a180924
|
Ooh, good thought. I have this addition in event-loop.go:
Sometime later this week or this weekend, I'll trace that back into the Flutter source to find out where the event loop/tasker is getting shared between threads, and what we can do on the go side to fix it. (Or, alternately, what the default Linux embedder does.) |
The event-loop thread mismatch may potentially be barking up the wrong tree. Even without spawning extra isolates, the event loop gets called from the io.flutter.ui thread.
|
That was barking up the wrong tree, looking at embedder.h: PostTask doesn't need to be called from the same thread as newEventLoop, so long as the posted task runs on that thread, which is what happens. I think it has to be something in the task scheduling somewhere, going by the fact that Flutter is trying to free the skia image after freeing the GPU. I just can't work out how it's happening yet. |
Yea, IIRC flutter has multiples task runners: https://github.com/flutter/flutter/wiki/The-Engine-architecture#threading We must inform the engine which was the thread used to initialized the graphic context (main thread) and this is done through: Lines 52 to 56 in a180924
This enables the engine to execute task of the ui runner on the go-flutter main thread. Seems Logical.
Yea correct! The comment is wrong. What is bugging me is that the error occurs on FlutterEngineShutdown so no tasks should run after this? Lines 17 to 18 in a180924
|
On second thought, I think you're right—I'd have to step through it with the debugger again, but it does seem like the event loop is no longer looping by the time we get to FlutterEngineShutdown. That is, it's blocked while FlutterEngineShutdown happens, and outside the loop that actually runs tasks. Hmm. Maybe I need to see if I can find where Flutter tears down the root isolate, and make sure that's not happening more than once for some reason? But it doesn't seem like it is, based on breakpoints and what hits them. It does, rather, seem like a simple 'things happening out of order' problem. |
Working forward from startup, it looks like the two threads that see segfaults, in my experience (io.flutter.io and io.flutter.ui) are engine-managed, and created/managed by Flutter. (The only two we're even allowed to provide are the platform and raster threads.)
So whatever's happening is at least partially the Flutter engine's fault, maybe? It doesn't seem to be happening on a thread managed by us, at least. I'll have to look at what the other embedders are doing, too. Maybe we need to flush our message loops on exit prior to shutdown, or something. I noticed that the FlutterEmbedderGLFW example in the flutter engine repository never actually calls FlutterEngineShutdown. That seems like potentially poor practice, but it does keep the crash from happening, so I can continue work on the project I'm using this embedder for in parallel with looking into this bug. |
I did some stepping through the flutter engine shutdown procedure tonight, without much to show for it. There are some bits and bobs in shell.cc that tear things down cleanly. They run the platform and render tasks on the current thread correctly, and queue some work on the UI and IO threads that is also correctly time. Then, after all the teardown is seemingly done, some events shake out of the IO thread's message loop task queue and try to free some Skia image objects after the GPU is gone. I think my next step is going to be two runs of the shutdown process, one with two isolates and one with one, and breakpoints on the particular Skia objects that get torn down late, so I can see where/how the Flutter code interacts with them and why they're in the IO queue late. |
With your previous message I now remember that we are doing something particular in: go-flutter/embedder/embedder_helper.c Lines 52 to 57 in b048f2a
And also: #134 (comment) |
Some lunchtime investigation suggests that letting Flutter manage the render task runner doesn't make a difference, which makes sense. Flutter's teardown code seems to occur in the order I'd expect it to.
Today's lunchtime work says that it has to do with the SkiaUnrefQueue owned by ShellIOManager. (ShellIOManager works on the IO task runner/thread, and its responsibilities seem mainly limited to loading and processing images.) The queue's Drain method is called during the teardown code above, before the GrGLGpu object gets destroyed (I think; it's been a bit), which clears out all the events in it. There's a note that it's the caller's responsibility to ensure that no further unrefs get queued after draining the queue manually, but I think that carries with it the implication that the reason you shouldn't queue unrefs after manual draining is that you've probably also deleted all of the resources the unref needs to succeed. Later on, after the teardown code above completes, another event shows up in io_manager's unref queue. It comes from Picture's dispose method. Picture has a reference to the io_manager unref queue since, as the name suggests, it manages the loaded image. So, for some reason, when tearing down the app with two isolates, the UI is disposed of (or the UI dispose tasks run) after the renderer teardown finishes. Getting closer now! |
I found the first major difference between the background isolate and no-background-isolate case. With no background isolate, With a background isolate, This seems potentially meaningful, but also maybe not our fault? I think my next step is probably going to be opening an issue in the Flutter repository to see if they'll take this seriously. |
Yea, I think we should open an issue on the flutter repo. |
Hover doctor
Problem description
The application experiences a segfault in the io.flutter.ui thread when closed while additional isolates are running. This happens with the debug engine (which comes from Google) as well as profile/release (which we build). The official Flutter desktop embedding doesn't crash in the same circumstance, so despite the lack of anything out of the project binary in the backtrace, it seems to be go-flutter-desktop related, somehow.
I haven't managed to find a minimal reproduction yet—it's happening in a large app that isn't currently in a public repository, and I haven't been able to work out which part is causing the problem. I suspect it'll be easier to narrow down once I get an engine built with debug symbols, but I thought I'd check in here to see if there were any ideas.
The text was updated successfully, but these errors were encountered: