Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

"pivot_root: Invalid argument" when running on a SLURM cluster node from NFS #594

Open
dmikushin opened this issue Sep 11, 2023 · 12 comments

Comments

@dmikushin
Copy link

When running bwrap within a job of a SLURM cluster node, I get the following error:

$ srun user  
bwrap: pivot_root: Invalid argument

It's highly desirable to let bwrap pass through srun, because this way bwap would also work for the cluster jobs!

@smcv
Copy link
Collaborator

smcv commented Sep 11, 2023

a SLURM cluster node

Sorry, I have only the most vague possible idea of what this is. If the only thing the kernel is willing to tell us is "Invalid argument" then it is unlikely to be solvable without someone with root on a suitable system digging into what the exact situation is.

bubblewrap's use of pivot_root is known not to be possible when inside a chroot environment (#135). If SLURM cluster nodes involve running user-supplied code in a chroot or container, there is probably nothing that bubblewrap can do to solve this: if the cluster node does not allow bubblewrap to make the syscalls that it requires to do its job, then we can't do impossible things.

It's highly desirable

I'm sure it is, but that desire doesn't make it possible for bubblewrap to do things that the kernel won't allow.

@smcv smcv changed the title bwrap: pivot_root: Invalid argument "pivot_root: Invalid argument" when running on a SLURM cluster node Sep 11, 2023
@smcv
Copy link
Collaborator

smcv commented Sep 11, 2023

As a general thing, if there is key information about your system that is unusual and likely to be part of the root cause for an issue, please make sure to mention it in the issue title. Unfortunately it's quite common for the only information available to be pivot_root: Invalid argument, and we don't want people who get that error message for a completely different reason to be jumping onto this issue, because mixing up multiple root causes on one issue report makes it confusing and time-consuming to disentangle.

@dmikushin
Copy link
Author

The failing if (pivot_root (base_path, "oldroot")) with base_path="/tmp" gives me a feeling that this setup is optional. So is there a possibility to enable/disable it, along with the related surrounding code?

@dmikushin
Copy link
Author

The failure happens even with the minimum possible setup: (gdb) r --bind $HOME/chroot / bash

@smcv
Copy link
Collaborator

smcv commented Sep 11, 2023

gives me a feeling that this setup is optional

No, the use of pivot_root is a necessary part of how bubblewrap does what it does: you can't change the root directory without changing the root directory. /tmp was just a convenient path that (we assume) is guaranteed to exist, because we need somewhere to put a temporary mount point during a transitional state while we reorganise the mount point hierarchy.

Old versions of bubblewrap used chroot(2), but that prevented recursive use of bubblewrap (bubblewrap inside bubblewrap), and also led to tools outside the container seeing misleading paths starting with /newroot when inspecting processes inside the container.

@dmikushin
Copy link
Author

dmikushin commented Sep 11, 2023

Old versions of bubblewrap used chroot(2)

Oh, good to know that it actually matches my other result made of a fakeroot/fakechroot pair: there I don't see the need for pivot_root, and in the same SLURM environment they worked well, but are not so nice and new as bubblewrap.

@dmikushin
Copy link
Author

I've figured out that bubblewrap works when my root folder is on the local disk. But when I go to the compute node of cluster, this local disk is now mounted via NFS. And this seems to be the actual reason why bubblewrap does not work: it does not like NFS. Is pivot_root() unavailable for NFS or for any network file system in general?

@dmikushin dmikushin changed the title "pivot_root: Invalid argument" when running on a SLURM cluster node "pivot_root: Invalid argument" when running on a SLURM cluster node from NFS Sep 12, 2023
@smcv
Copy link
Collaborator

smcv commented Sep 12, 2023

pivot_root has some poorly-documented restrictions, and NFS is quite an unusual filesystem, so it probably doesn't fit one of those restrictions?

@dmikushin
Copy link
Author

Yes, however Linux can even boot with NFS root filesystem, this is known to work since classic times. This problem is not well known, I found only one related issue.

@rusty-snake
Copy link
Contributor

rusty-snake commented Sep 12, 2023

FWIW, the documented cases of EINVAL

  • EINVAL new_root is not a mount point.
  • EINVAL put_old is not at or underneath new_root.
  • EINVAL The current root directory is not a mount point (because of an earlier chroot(2)).
  • EINVAL The current root is on the rootfs (initial ramfs) mount; see NOTES.
  • EINVAL Either the mount point at new_root, or the parent mount of that mount point, has propagation type MS_SHARED.
  • EINVAL put_old is a mount point and has the propagation type MS_SHARED.

@dmikushin
Copy link
Author

Anyway, I'm going to offer #595 as a workaround to allow smoother behavior on systems with failing pivot_root(). I think bubblewrap is too good to miss it completely due to this issue :)

@dmikushin
Copy link
Author

FWIW, the documented cases of EINVAL

Yes, I've checked, nothing here that my case could possibly fall into. In order to learn more, I need to do kernel debugging, which I can't do easily: will need to replicate the entire system locally.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Projects
None yet
Development

No branches or pull requests

3 participants