-
-
Notifications
You must be signed in to change notification settings - Fork 5.4k
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Segfault (msgpack) #27859
Comments
Can confirm, I just encountered this at the current HEAD a6b6d03. |
Can you build Nvim with ASAN and get an ASAN log? |
Commit: When crash (interesting fact - sometimes the tmux session with Alacritty dies)
Last logs:
Sometimes nvim exit with 141 error (so asan not created), when i build in Debug mode, I See logs:
|
before investigating this more, can someone confirm they still see this after #27871 ? |
@bfredl everything seems ok. I will reopen the issue if the crashes happen again |
@bfredl I'm reopening this issue because some crashes occurred.
|
We just got this bug report in Neovide, that might be related, but it's too early to tell for sure Based on the logs, it looks like some rpc data is lost, and therefore we get the parameters of |
@bfredl, reverting to 6525832, fixes the issue in Neovide, so I'm fairly sure that neovide/neovide#2452 is the same bug, we just detect the error and report it earlier than the terminal version. We have the following log:
So, no flush is revived before the response to |
I'm also encountering a similar issue, bisected to conclude that #55c9e2c introduced this. I can only reproduce the crash under tmux ( When running a debug build, instead of neovim/src/nvim/msgpack_rpc/unpacker.c Lines 460 to 462 in 57adf8c
which I suppose is indicative of garbage data (the values of both It seems like most such crashes occur more or less as follows (all in the TUI process):
The "garbage" state of the (1970302569).to_bytes(4, 'little')
# >> b'inpu'
# i.e. inpu(t) ? which could be some memory corruption due to the |
Setting up a memory watch point at @bfredl I think I've figured out what's causing this. Here is the specific sequence of events that I think causes this bug: After a successful and complete redraw (state neovim/src/nvim/msgpack_rpc/unpacker.c Lines 359 to 368 in 603f3b3
Then in neovim/src/nvim/msgpack_rpc/channel.c Lines 288 to 298 in 603f3b3
But now However, if the read is incomplete (see my previous comment, seems to occur when size > For example the TUI could send some input (in the case of the debugging session I mentioned above, the TUI sends a focus event) which uses data allocated with Making the NOTE: The bug is pretty hard to reproduce consistently. I usually need to have |
@bfredl hello, I build from master and this error is not fixed |
@qRoC as always, need more details than that. exact commit? stacktrace? steps to reproduce? |
@justinmk As I said in the first message, sigfault does not always occur (often 130, 139, 141). At the moment, the case that consistently caused the problem (scrolling the main buffer) has been fixed. But I caught 139 while scrolling the float window and at the moment I can't reproduce it, but I'll keep you posted. |
In Neovide we also still have some reports of crashes. Here's one including a log file from the Neovim side Based on a very surface level analysis it looks like there's a race condition between processing RPC messages from Neovide, sending of the UI flush messages, so the responses get intermixed. It was also confirmed again that #55c9e2c commit that breaks it. My humble opinion is that it should be reverted for the 0.10 release at least, since it's just an optimization without any profiling data to back up the need for it. |
I also believe that the fix by @theofabilous, was just a band-aid fix on the receiving side, while the actual problem occurs on the sending side. |
No, it is not. Logs were just published 15 minutes ago. We do not revert stuff in a panic immediately a few minutes after there is any data indicating there is an issue in the first place.
Things can be actual problems even if they don't affect you personally. This was a bug in the TUI client. |
A hypothesis is that we lack enough synchronization between UI packing and non-UI packing (might need to flush UI messages if we have a reason to send a notification in the middle of rendering) |
Just out of curiosity, what does it fix, other than optimize things, did someone report performance problems? Reverting something just before a release that has been open for two months without a proper fix, is not a panic revert. But if #27655, indeed fixes something important, then I understand.
Please, don't unless you absolutely have to. We already have a lot of problems with the atomicity of the UI protocols, with partial redraws and the mouse getting sent to random positions for example. |
?? details that identifies this particular crash was reported one hour ago, not two months ago. We will not revert a lot of work in a panic before anyone even has had the change to put together a fix (which is what I am working on).
Oh, I don't mean introducing any more interleaving than already is. This is just about fixing the data corruption bug that you are seeing with the existing message ordering.. |
This bug is two months old, and we have reported the same information twice. There were no Neovim logs, but based on the Neovide logs you can draw the same conclusion as you did here, the Neovim side of the logs adds almost no additional information and no-one even asked for them. It also looks like it's not properly fixed, even for the terminal #27859 (comment) With the release of 0.10 scheduled for tomorrow, it does not seem worth trying to squeeze in a risky fix, especially if the real-world benefits of #27655, are questionable. The bug can and probably should be fixed after the release of 0.10. |
We can always backport fixes. Releases are only blocked on topics that are "one way doors", not on things that can be fixed in an easily-backwards-compatible way. |
And we appreciate the reports, but please let us maintain our project according to our priorities. We are not telling you how to run Neovide, either. You should also be aware that latching on to an existing (and not identical) bug report will not give you the same visibility as a dedicated issue that provides the necessary information in that issue (not behind a link). |
The problem is that this causes a lot of crashes together with Neovide, so it very much affects us too. And I think the reports we have got this far are only the top of the iceberg. Can you motivate why #27655 is more important than fixing those crashes? |
This comment was marked as resolved.
This comment was marked as resolved.
This comment was marked as outdated.
This comment was marked as outdated.
This comment was marked as outdated.
This comment was marked as outdated.
Sorry to kick off a minor war over in this issue with my report to Neovide. After seeing a commit by @bfredl (91a4938) and testing it, I am thus far unable to cause any crashes on Neovide in the last 30 minutes of doing my level best do so where previously I could reliably get >20 crashes in that period. @fredizzimo if anyone else notices crashes, ask them to try a Neovim commit after that and see if they experience crashes then -- feel free to ping me as well on those issues and I'll try to repro on my side. I'll keep y'all posted if I do encounter another crash over the next 2 - 3 days so this issue can hopefully be closed once and for all. @bfredl Unfortunately, I'm not smart enough to understand precisely how Line 781 in c795835
guards against this issue in the If you don't have a moment, no problem and either way thanks for looking into this 🙂. |
@PriceHiller Thanks for the report and the detailed investigation. It was a buffer overflow. We calculate the maximum possible size of the next item to be emitted to the buffer but the calculation was wrong (now |
Thanks, I asked for testing in all three issues we have open for this: |
Moving to 0.10.1 milestone. #28717 possibly fully resolves this, but even if it does not the behavior was (apparently) present in 0.9.5 as well, so is not a regression. If more fixes are needed they can be backported in 0.10.1, otherwise we can safely close after hearing results from Neovide users. |
I still haven't had any crashes with Neovide since the merging of #28717 if that helps. We'll have to see I guess if any other users face a crash. |
Still no crashes to report. I think it's pretty safe to say that #28717 resolved this.
Unfortunately I'm the only really active Neovide user on this issue, so not exactly a big sample size 😅. |
That is good enough for us, though :) Let's close this for now; we can always reopen if the (same!) issue pops up again, or open a new issue if the details are different. Thank you! |
If that happens, please ping me. I'll gladly try to debug from my side for something similar or other msgpack related issues. Neovim has given me a lot of value -- so I'm more more than happy to be a gineau pig. |
I have encountered this issue literally hourly with 0.9.5 that I've been running for the last few months. Now with 0.10.0 now issues so far, and it was quite easy to reproduce on my side before. Because of that I actually moved to some IntelliJ IDE for some time, because it was unbearable. But now I'm happy to be only back in Neovim, yay! So thank you! |
Problems
chan_close_with_error:560: RPC: (null)
)Steps to reproduce
Scroll any file
Expected behavior
No crash/errors
Neovim version (nvim -v)
NVIM v0.10.0-dev-2582+g2a8cef6bd-Homebrew
NVIM v0.10.0-dev-2593+ga6b6d036b-Homebrew
NVIM v0.10.0-dev-2536+g55c9e2c96 <-- broken commit
Vim (not Nvim) behaves the same?
no
Operating system/version
macOS 14
Terminal name/version
Alacritty
$TERM environment variable
xterm-256color
Installation
brew
The text was updated successfully, but these errors were encountered: