Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

ceph-mgr/dashboard: python-cryptography PyO3 modules may only be initialized once per interpreter process #20

Open
bazaah opened this issue Sep 1, 2023 · 9 comments
Labels
bug Something isn't working question Further information is requested

Comments

@bazaah
Copy link
Owner

bazaah commented Sep 1, 2023

Opening this to track progress / news on the upstream project(s). See #16 for the history of this issue.


The basic facts are:

  1. Ceph uses a relatively rare "subinterpreter" model for running mgr modules. The important bit here is that there is a 1:Many relationship between the OS process and Python interpreters.
  2. CPython doesn't really prevent you from (theoretically) sharing python objects between interpreters in the same process, and doesn't really take stance on it beyond "only do this if you know what you're doing", but typically one can assume that bad things will happen if you share Python objects between interpreters and aren't a CPython dev who understands when it is allowed.
  3. PyO3 is a rust project for interop with Python, notably for us, it allows one to expose rust code into native Python, and is used in the popular python-cryptography module
  4. Up until v0.17.0 (or really: pymodule: only allow initializing once per process PyO3/pyo3#2523) the maintainers allowed one to initialize the same PyO3 backed module multiple times per process invocation
  5. Technically, doing so opens soundness holes in rust to Python libraries around globals, as PyO3 doesn't (can't?) prevent module writers from storing Python objects that were implicitly created in one interpreters from being accessed in another (the rust global is unique at the process level).
  6. Therefore, rather than undertaking an intractable redesign, the PyO3 maintainers simply remove the ability to initialize a module more than once in a process.
  7. python-cryptography upgraded to the problem version of PyO3 in v41, which is the current latest in Archlinux: https://archlinux.org/packages/extra/x86_64/python-cryptography/
  8. Therefore, ceph-mgr's dashboard will blow up because the restful module (loaded earlier, mandatory module) uses python-jwt which initializes python-cryptography first

At the moment, I don't see a maintainable path forward without help from the upstream packages.

  • I could attempt to extend the build to use a venv and somehow install that alongside ceph
    • But doing so is a considerable maintenance effort using technologies I'm not super familiar with (CMake, Python)
    • And would be a very large patch to the build system relatively
  • I could attempt to forcefully override just python-cryptography, but this will stop working as so as one of the rdeps required a version greater than v41
  • Or I do nothing, mention that the dashboard is broken and see if I can figure out a way to move the conversation forward upstream
    • The most annoying, as I use the dashboard myself, but ultimately the only choice I think I can support long term.
@bazaah bazaah added bug Something isn't working question Further information is requested labels Sep 1, 2023
@niko2
Copy link

niko2 commented Oct 26, 2023

@bazaah, it's about your basic fact 8)

I am seeing 2 of my 3-node cluster where dashboard is loaded before restful, so dashboard is loaded and works.
How is it possible ?

@bazaah
Copy link
Owner Author

bazaah commented Oct 27, 2023

I've personally never seen that (restful always loads first) but I'm unsure if this ordering is enforced in the code somewhere. Its very possible that for whatever reason, your dashboard loads first, and the restful module aborts

@bazaah
Copy link
Owner Author

bazaah commented Jan 1, 2024

Seems the upstream has become aware of the issue:

However, it appears that this may still not be enough because of bcrypt usages, and moreover, we all still basically need PyO3 to provide a path forward for subinterpreters. That said, I'll probably experiment a bit with building a py-crypto-less dashboard.

@hswong3i
Copy link

hswong3i commented Jan 3, 2024

@bazaah
Copy link
Owner Author

bazaah commented Jan 4, 2024

Good to know, though I'm somewhat limited until the python-bcrypt package updates to the (yet to merged) PR above

@hswong3i
Copy link

hswong3i commented Jan 4, 2024

Good to know, though I'm somewhat limited until the python-bcrypt package updates to the (yet to merged) PR above

Please don't expect bcrypt will accept that PR as pyca/bcrypt#714 (comment) mentioned.

I am creating that PR for upstream bcrypt just because it works with my own OBS packaging pipeline, and I like to share the possible workaround with other else downstream distro packagers.

@bazaah
Copy link
Owner Author

bazaah commented Mar 16, 2024

Yeah, this is now completely broken on arch latest. During my testing for the 18.2.2-1 release:

log_channel(cluster) log [ERR] : Failed to load ceph-mgr modules: mirroring, status, volumes, rbd_support, crash, snap_schedule, restful, localpool, osd_perf_query, telemetry, nfs, alerts, rgw, mds_autoscaler, pg_autoscaler, zabbix, devicehealth, balancer, stats, cephadm, rook, influx, osd_support, insights, selftest, diskprediction_local, dashboard, orchestrator, telegraf, prometheus, iostat, progress

I'm going to have to fork either the modules (bcrypt, cryptography) or pyo3 and remove the safety check, as this issue is now intractable

@bazaah
Copy link
Owner Author

bazaah commented Mar 16, 2024

So, with a two line change to pyo3, removing the import error, and the equivalent of rebuilding python-bcrypt with:

diff --git a/src/_bcrypt/Cargo.toml b/src/_bcrypt/Cargo.toml
index a9c7f7c..02317c8 100644
--- a/src/_bcrypt/Cargo.toml
+++ b/src/_bcrypt/Cargo.toml
@@ -6,7 +6,7 @@ edition = "2018"
 publish = false

 [dependencies]
-pyo3 = { version = "0.20.0", features = ["abi3"] }
+pyo3 = { git = "https://git.st8l.com/luxolus/pyo3", tag = "v0.20.3-subint+1", features = ["abi3", "unsafe-allow-subinterpreters"] }
 bcrypt = "0.15"
 bcrypt-pbkdf = "0.10.0"
 base64 = "0.21.5"

The bcrypt issue affecting all mgr modules has receded. I realize now that ceph v18.2.2 doesn't include ceph/ceph#55689, so I'll need to backport that myself and rebuild.

Unsure how I'm going to expose my python-bcrypt fork, yet.

bazaah added a commit that referenced this issue Mar 16, 2024
Removes the (limited) runtime usages of python-jwt, and therefore,
python-cryptography from the mgr dashboard module.

This should restore dashboard functionality, if used alongside a
patched pyo3 for python-bcrypt.

Upstream-Ref: ceph/ceph@33d8bef
References: ceph/ceph#55689
References: https://tracker.ceph.com/issues/63529
References: #20
References: https://git.st8l.com/luxolus/pyo3/commit/44d6919168621b46e4e884130ca43338655b020c
bazaah added a commit that referenced this issue Mar 18, 2024
Removes the (limited) runtime usages of python-jwt, and therefore,
python-cryptography from the mgr dashboard module.

This should restore dashboard functionality, if used alongside a
patched pyo3 for python-bcrypt.

Upstream-Ref: ceph/ceph@33d8bef
References: ceph/ceph#55689
References: https://tracker.ceph.com/issues/63529
References: #20
References: https://git.st8l.com/luxolus/pyo3/commit/44d6919168621b46e4e884130ca43338655b020c
bazaah added a commit that referenced this issue Mar 18, 2024
- python-bcrypt-allow-subinterpreters.patch
- python-bcrypt-prefix-ceph.patch

Together, these allow us to build a renamed python-bcrypt package, as
ceph_bcrypt.

This will allow us to bypass the thorny pyo3 issues we have with various
mgr modules exploding at runtime.

References: #20
bazaah added a commit that referenced this issue Mar 23, 2024
Removes the (limited) runtime usages of python-jwt, and therefore,
python-cryptography from the mgr dashboard module.

This should restore dashboard functionality, if used alongside a
patched pyo3 for python-bcrypt.

Upstream-Ref: ceph/ceph@33d8bef
References: ceph/ceph#55689
References: https://tracker.ceph.com/issues/63529
References: #20
References: https://git.st8l.com/luxolus/pyo3/commit/44d6919168621b46e4e884130ca43338655b020c
bazaah added a commit that referenced this issue Mar 23, 2024
- python-bcrypt-allow-subinterpreters.patch
- python-bcrypt-prefix-ceph.patch

Together, these allow us to build a renamed python-bcrypt package, as
ceph_bcrypt.

This will allow us to bypass the thorny pyo3 issues we have with various
mgr modules exploding at runtime.

References: #20
@bazaah
Copy link
Owner Author

bazaah commented Jun 11, 2024

It appears this chicken has come home to roost now: https://tracker.ceph.com/issues/64213. I'm curious what the outcome of this will be.

zer0def pushed a commit to zer0def/pyo3 that referenced this issue Nov 8, 2024
Allow the subinterpreter safeguards to be disabled, so that applications
like Ceph's manager can continue to use pyo3 modules without soft
crashing.

Enabling this feature should be done with caution, as any storage of Py
objects in rust statics can lead to undefined behavior.

However, not all consumers of pyo3 use global state, and thus a subset
of them (such as python-bcrypt) are safe to use in subinterpreter
contexts.

References: bazaah/aur-ceph#20
References: PyO3#2523
References: pyca/cryptography#9016
References: PyO3#2346 (comment)
References: PyO3#2346 (comment)
References: PyO3#3451
Signed-off-by: Paul Stemmet <[email protected]>
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
bug Something isn't working question Further information is requested
Projects
None yet
Development

No branches or pull requests

3 participants