-
Notifications
You must be signed in to change notification settings - Fork 635
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
calc: bgsave not working ... #9082
Comments
@caolanm you noticed this - the logs support the thesis; but I wonder what creates the extra thread we don't know about. |
FWIW the coolwsd.xml for perf-staging has experimental:true and deepl:true, but that doesn't seem to have any thread related stuff, so I presume it has then to be some core thread. Add dumping of /proc/self/task/*/comm ? |
Ah - sure; I just need a way to reproduce it; it may well be something like the officecfg config writing thread, or as you say a DeepL / some other thread =) I wonder if those threads set comm though =) Also - it's not possible to access /proc/thread/self/tasks once we dropped capabilities; this is only possible due to holding a directory file-descriptor open from pre-dropping capabilities; so for new directories unknown at that time we can't inspect them at least inside the forkit process itself =( so manual hunting is better for now I think. |
Easy to reproduce it seems from staging-perf: $ cat /proc/25814/task/*/comm |
gdb behaving oddly - but during save at least: Thread 296 (Thread 0x7f05cd175700 (LWP 26816) "WakeUpThread"): several: ... Thread 2 (Thread 0x7f05d4810700 (LWP 25821) "kit_spare_001"): Ah - and I'm suckered - the save is synchronous because it is manually triggered in this case: bother ... =) |
At the point of auto-saving I have: Thread 5 (Thread 0x7f05ce177700 (LWP 26232) "kitbroker_001"): Thread 4 (Thread 0x7f05ce978700 (LWP 26231) "kitbroker_001"): Thread 3 (Thread 0x7f05cf179700 (LWP 26230) "kitbroker_001"): Thread 2 (Thread 0x7f05d4810700 (LWP 25821) "kit_spare_001"): Which looks fine; perhaps we don't shut the watchdog down properly - let me poke at that. |
This is assumed to be either the webdav thread, or perhaps the configmgr thread - it's really rather tricky to decide - both are patched to join nicely: https://gerrit.libreoffice.org/c/core/+/167868 - configmgr Quite probably there are more to find; lets see ... |
After https://gerrit.libreoffice.org/c/core/+/167858 should there now be a matching getLOKit()->startThreads() in Document::startThreads? #9114 for that thought |
No more instances of: "WRN Failed to ensure we have just one, we have: 2| kit/Kit.cpp:1388" on the staging server since this was deployed; lets assume this is closed then =) |
Interestingly we see this problem during Unit tests running: https://cpci.cbg.collabora.co.uk:8080/job/github_online_master_debug_vs_co-24.04/1653/consoleText contains the immortal: kit-4140617-4140617 2024-05-22 17:24:28.849983 +0000 [ kitbroker_002 ] WRN Failed to ensure we have just one, we have: 2| kit/Kit.cpp:1397 in the check before:
which is exciting =) |
A ton of code reading later I came up with this: https://gerrit.libreoffice.org/c/core/+/168032 which I hope together fix this; turns out we start and stop that progress thread a lot 18x times during the save-torture test for example; can't prove it's the problem - but it seems quite plausible :-) |
It seems background save doesn't always work - sometimes we have a stray thread:
May 16 16:08:42 ip-172-31-14-76 coolwsd[216014]: kit-216014-216014 2024-05-16 16:08:42.530632 +0000 [ kitbroker_148 ] WRN Failed to ensure we have just one, we have: 2| kit/Kit.cpp:1388
And then we cannot save ... but what thread is it ? =)
The text was updated successfully, but these errors were encountered: