Skip to content

Autocomplete seg faults when attempting to autocomplete np on same statement as import #5878

New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Closed
mofojed opened this issue Jul 31, 2024 · 7 comments · Fixed by #5954
Closed
Assignees
Labels
autocomplete bug Something isn't working python
Milestone

Comments

@mofojed
Copy link
Member

mofojed commented Jul 31, 2024

Description

When autocomplete tries to complete after typing the command:

import numpy as np
np.

it seg faults.

Steps to reproduce

  1. Manually type the following command in the console input, but do not hit Enter:
import numpy as np
np.

In other words, the way I am typing the above is: import numpy as np<esc><shift+enter>np.<wait-for-crash>

Expected results

  1. Autocomplete should appear for np

Actual results

  1. Server seg faults. UI disconnects. Error is outputted:
Server started on port 10000
Deephaven is running at http://localhost:10000/?psk=1gk436ha6s5hu with authentication type io.deephaven.authentication.psk.PskAuthenticationHandler
Press Control-C to exit
[1]    829510 segmentation fault (core dumped)  JAVA_HOME=/usr/lib/jvm/java-11-openjdk-amd64 deephaven server

For some reason my core dump file is empty.

Additional details and attachments

Screencast.from.2024-07-31.02.25.03.PM.webm

For some reason my core dumps are empty. Wasn't sure why, if you need me to produce a core dump I'll look into it further.
Workaround is to just do a separate command import numpy as np and run that, then after that autocomplete seems to work fine for subsequent commands involving np.

Engine Version: 0.35.2-SNAPSHOT
Web UI Version: 0.85.3
Java Version: 11.0.23
Barrage Version: 0.6.0
Browser Name: Firefox 128
OS Name: Linux
@deephaven/js-plugin-ui: 0.18.0
@deephaven/js-plugin-plotly-express: 0.9.0
@deephaven/js-plugin-matplotlib: 0.4.1

@mofojed mofojed added bug Something isn't working triage labels Jul 31, 2024
@niloc132
Copy link
Member

Some initial investigation: @mofojed can also reproduce when running from gradle and from docker. On my machine however, I can only reproduce when running from docker.

I suspect some wheel in my local (non-docker) venv is out of date, thus avoiding the issue, but we haven't checked if this is a regression yet.

@niloc132
Copy link
Member

niloc132 commented Jul 31, 2024

It looks like this is specific to numpy 2.x - downgrading this to something like 1.26 appears to resolve this.

pip install numpy==1.26  

Also, it looks like this snippet is sufficient:

import numpy
numpy.<ctrl-space>

@niloc132
Copy link
Member

niloc132 commented Aug 1, 2024

Filed davidhalter/jedi#2020 with jedi to correctly import without a RecursionError.

However, it seems that a recursion error doesn't safely pass through jpy into Java, but instead causes this seg fault. I'll file that when I have a simple test case.

I tried to add a workaround the jedi issue by pre-loading numpy, but it doesn't seem to work - until the numpy/np etc symbol is in your globals(), it seems it doesnt fix the autocomplete cache:

from importlib.metadata import version
if version("jedi") == "0.19.1" and version("numpy").startswith("2.0"):
    # This appears unused, but we need to force the module to load
    import numpy
    preload_module("numpy")

I'll be adding a separate workaround in our autocomplete logic that will catch exceptions from jedi and return something safe for Java, and possibly limit how many results we read back (so that some long tail result like numpy.finfo doesn't poison the rest of the results).

@rcaudy rcaudy removed the triage label Aug 1, 2024
@rcaudy rcaudy added this to the 0.36.0 milestone Aug 1, 2024
@niloc132
Copy link
Member

niloc132 commented Aug 1, 2024

Filed jpy-consortium/jpy#157 with jpy - apparently a higher than default recursionlimit must be used, but jedi sets recursionlimit to 3000 on startup:

https://github.com/davidhalter/jedi/blob/91ffdead3291263a356a66e40c7e82cfa431107f/jedi/api/__init__.py#L46-L48

@niloc132
Copy link
Member

niloc132 commented Aug 6, 2024

I've closed the jpy issue, and there is little or nothing we can do from DHC as far as I can tell - there seems to be a case where Python 3.10.x will segfault in some circumstances instead of a RecursionError. So far, this does not happen on arm64 (mac or linux), and has not been observed on Python 3.11 or 3.12.

Plain python doesn't seem to have the problem either, which is why I initially pointed the finger at jpy, but after some time with gdb, the segfault itself is coming from inside cpython itself, so can't reasonably be handled by jpy or DHC. It isn't that the RecursionError is being turned into a seg fault, its that sometimes we get a segfault in this situation rather than the expected RecursionError.

I don't think there is much we can do to mitigate this further, except to assist with a fix for jedi, and encourage users to update, plus encourage using Python 3.11 or above.

@niloc132
Copy link
Member

niloc132 commented Aug 6, 2024

Affected:
Python 3.9 and 3.10

Not affected:
Python 3.8, 3.11+

@niloc132
Copy link
Member

niloc132 commented Aug 8, 2024

I've found we do have one more mitigation option - the default python recursion limit is 1000, and jedi automatically raises this to 3000. It looks like the bug doesn't occur if that value is around 2000 or less - in the jpy bug I reported that it wouldn't happen in that example code with 2367, but in the DHC server, I still see a segfault with that value.

Proposed mitigation: detect the current python version, the current numpy version, and the current jedi version. If these match our buggy versions, we can drop the recursion limit to 2000, and potentially warn (once?) if the user's code tries to raise the limit again.

niloc132 added a commit to niloc132/deephaven-core that referenced this issue Aug 23, 2024
…segfault (deephaven#5954)

Simple mitigation for jedi setting the recursion limit too high for
Python 3.9 and 3.10 to correctly handle RecursionErrors. This prevents a
segfault for a known bug in jedi, and only applies in cases where we
know the bug can take place.

Technically it is possible for these python versions to crash in this
way without autocomplete or numpy, but it first would require the
recursion limit to be raised, so the fix only applies when autocomplete
is used.

The `deephaven_internal.autocompleter` module has a
`set_max_recursion_limit` method to define a different max recursion
limit for an application, in case the default of 2000 is not sufficient
for all cases.

Fixes deephaven#5878
devinrsmith pushed a commit that referenced this issue Aug 23, 2024
…segfault (#5980)

Simple mitigation for jedi setting the recursion limit too high for
Python 3.9 and 3.10 to correctly handle RecursionErrors. This prevents a
segfault for a known bug in jedi, and only applies in cases where we
know the bug can take place.

Technically it is possible for these python versions to crash in this
way without autocomplete or numpy, but it first would require the
recursion limit to be raised, so the fix only applies when autocomplete
is used.

The `deephaven_internal.autocompleter` module has a
`set_max_recursion_limit` method to define a different max recursion
limit for an application, in case the default of 2000 is not sufficient
for all cases.

Fixes #5878
Backport #5954
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
autocomplete bug Something isn't working python
Projects
None yet
3 participants