-
Notifications
You must be signed in to change notification settings - Fork 236
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Add --overlay and related options #547
base: main
Are you sure you want to change the base?
Conversation
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Please add a Signed-off-by
to your commit(s) to indicate that you agree to the Developer Certificate of Origin terms.
@@ -332,6 +340,11 @@ usage (int ecode, FILE *out) | |||
" --ro-bind SRC DEST Bind mount the host path SRC readonly on DEST\n" | |||
" --ro-bind-try SRC DEST Equal to --ro-bind but ignores non-existent SRC\n" | |||
" --remount-ro DEST Remount DEST as readonly; does not recursively remount\n" | |||
" --overlay-src SRC Read files from SRC in the following overlay\n" |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
The overlayfs terminology for this seems to be that it is the "lower directory". I think it might be better to stick to the overlayfs terms instead of inventing our own, perhaps like this:
--overlay-lower LOWEST [--overlay-lower LOWER...] --overlay UPPER WORKDIR DEST
--overlay-lower LOWEST [--overlay-lower LOWER...] --ro-overlay DEST
--overlay-lower LOWEST [--overlay-lower LOWER...] --tmp-overlay DEST
Alternatively, if we are going to swap the order so that it's the same as lowerdir
(uppermost first, lowest last), perhaps it could look like this, with the first argument "opening a transaction" and --overlay-mount
"committing" it?
# Read-write
--overlay-upper UPPER WORKDIR [--overlay-lower LOWER...] --overlay-lower LOWEST --overlay-mount DEST
# Read-only
--ro-overlay [--overlay-lower LOWER...] --overlay-lower LOWEST --overlay-mount DEST
# Shortcut for tmpfs upper
--tmpfs-overlay [--overlay-lower LOWER...] --overlay-lower LOWEST --overlay-mount DEST
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Yeah, so the decision to order bottom-first and the decision to change the terms used were linked.
I started with something very similar to what you're suggesting here, reusing overlayfs terminology and ordering. Moving away from the overlayfs syntax (colon-separated paths, as used in #167) prompted me to rethink this. Everything else in the bubblewrap CLI is more consistent with bottom-first:
- The top of the documentation states that later options win out over earlier ones unless otherwise specified.
- Filesystem operations proceed from left to right, so first you bind
/
and then you bind top-level directories and then directories inside that and so on.
For people not already used to thinking in overlayfs option strings, I thought that starting at the bottom would feel more of a piece with the rest of bubblewrap.
And having chosen that, I didn't want to have to explain in the documentation that while the kernel expects ‘lower directories’ to be provided top-first, we expect ‘lower directories’ bottom-first—the inexperienced reader would be learning about two things, only one of which is relevant to their use of bubblewrap. I thought—and I feel less confident in this choice than the other one—that changing the terminology would cut that knot. I figured it would suggest to people that know overlayfs that we aren't directly exposing the overlayfs lowerdir property that they're used to, without adding a distraction for the people who don't know overlayfs and just want to learn how to use bubblewrap. I went with src/source because that's the term already consistently used in the bubblewrap documentation to refer to host paths that are being mounted somehow into the container.
This is all pretty superficial and I don't want to fight you too hard on it; I'd rather have either of these designs implemented than none of them. Let me know if you disagree with my reasoning and I'll revise it.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
That reasoning does make sense, I'll think some more about this.
This option can be used multiple times to provide multiple sources. | ||
The sources are overlaid from bottom to top: if a given path to be | ||
read exists in more than one source, the file is read from the last | ||
such source specified. |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Am I right to think that this is the opposite of the order seen in mount -o lowerdir
? If it is, then I think the documentation should say so, perhaps something like this:
This option can be used multiple times to provide multiple sources. | |
The sources are overlaid from bottom to top: if a given path to be | |
read exists in more than one source, the file is read from the last | |
such source specified. | |
This option can be used multiple times to provide multiple lower | |
directories, with the lowest directory specified first and the | |
uppermost directory last. If the same path exists in more than | |
one of the directories, the file is read from the last directory | |
specified. Note that this is the reverse of the order used in | |
the <literal>lowerdir</literal> mount option. |
But I think it might be less confusing if we order the arguments from uppermost to lowest, consistent with the order used in mount options.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
(Connected to this thread and will probably resolve with it.)
@@ -523,4 +523,100 @@ echo "PWD=$(pwd -P)" > reference | |||
assert_files_equal stdout reference | |||
echo "ok - environment manipulation" | |||
|
|||
if test -n "${bwrap_is_suid:-}"; then |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
This part will also need to be skipped on older kernels where unprivileged mounting of overlayfs is not allowed. I'm not immediately sure what the best way to detect that is.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Me neither. It seems like different distributions may have started supporting this at different times—I've seen some posts that suggest Ubuntu was an early adopter—so I don't think a simple uname -r
version check would suffice. But I don't know a good way of detecting this capability other than trying it, which is a real pain without a tool like, well, bubblewrap. 😐
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I've added what I hope is the right check for this. I checked that the ‘no kernel support’ branch is reached on a VM installed from alpine-virt-3.14.9-x86_64.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
It looks plausible, at least. I should probably check this in my Debian CI environment (which involves an assortment of different kernel versions and container technologies) before merging, to make sure we don't have a bug similar to the one fixed in #554.
The commit message should have |
742d3af
to
f995ca0
Compare
f995ca0
to
09429c6
Compare
bc8044a
to
23ff0f8
Compare
The only outstanding thread here should be getting feedback on my reasoning for option ordering and naming. Please let me know what you think and if there's anything I can do to move this forward. |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Thanks for updating this. A re-review is on my list, but the PR is non-trivial and my list is not short.
@@ -332,6 +340,11 @@ usage (int ecode, FILE *out) | |||
" --ro-bind SRC DEST Bind mount the host path SRC readonly on DEST\n" | |||
" --ro-bind-try SRC DEST Equal to --ro-bind but ignores non-existent SRC\n" | |||
" --remount-ro DEST Remount DEST as readonly; does not recursively remount\n" | |||
" --overlay-src SRC Read files from SRC in the following overlay\n" |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
That reasoning does make sense, I'll think some more about this.
@@ -523,4 +523,100 @@ echo "PWD=$(pwd -P)" > reference | |||
assert_files_equal stdout reference | |||
echo "ok - environment manipulation" | |||
|
|||
if test -n "${bwrap_is_suid:-}"; then |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
It looks plausible, at least. I should probably check this in my Debian CI environment (which involves an assortment of different kernel versions and container technologies) before merging, to make sure we don't have a bug similar to the one fixed in #554.
I ran into a race when running two bwrap processes consecutively using the same overlay, and the kernel has On my Arch Linux system, this example: #!/usr/bin/env sh
set -eu
rm -rf $HOME/bwrap_ofs_cleanup_test
mkdir $HOME/bwrap_ofs_cleanup_test && cd $HOME/bwrap_ofs_cleanup_test
mkdir -p rw wrk
bwrap --unshare-user --unshare-pid --dev-bind / / --overlay-src "$PWD" --overlay "$PWD"/rw "$PWD"/wrk "$PWD" \
dd if=/dev/zero of=dummy bs=1M count=10 status=none
bwrap --unshare-user --unshare-pid --dev-bind / / --overlay-src "$PWD" --overlay "$PWD"/rw "$PWD"/wrk "$PWD" \
echo SUCCESS Fails with this error:
And the error on
Inserting a short sleep or a CauseI have taken a look and I believe that the cause is: Since I'm using Trying
|
Honestly, this is what I thought Maybe it could be enabled by a separate flag? I would imagine that waiting for everything in the sandbox to be well and truly finished is a feature that would be useful in other cases. |
TBH I thought the same, but you can see that that's not the case by running I guess it could be useful if you are launching something that daemonizes, but it's going to be trouble for anything that locks a resource like the overlayfs inode index. So I also think that having flag(s) to wait or kill all processes instead of just for the main process could be useful for those scenarios. Also, I suppose the |
FWIW: if pid1 exists (no matter how) everything in the pid namespace will get a SIGKILL. I.e. there is no way pid1 has exited and deamons are still running. Of course you only have a pid1 when you use unshare-pid. |
You mean if pid 1 exits, I think (unfortunately either word makes sense here, but only one is correct). Yes, if we are unsharing the pid namespace for the sandbox, when pid 1 within that namespace exits (that's the user-specified process if you used However, I don't know whether they are killed synchronously before pid 1 is allowed to exit, or whether pid 1 can exit while the overlayfs resource is still in use...
If there is any situation in which the reaper process would not exit promptly, then that would make sense as an opt-in thing. Or if the reaper process would always exit promptly in any case, then it might as well be on by default, in all cases where the reaper process gets run. (In situations where the reaper is not run, of course we must not wait for it.) If this is likely to be a practical problem, it might make sense to have an Or, you might find that your use-case is better served by creating a long-running sandbox containing an IPC server, and then poking commands into it, analogous to |
If I understand the relevant terms correctly, we know that the reaper process doesn't exit until all children have terminated (the
It might indeed make sense to have this sort of option (as well as for the other options overlayfs exposes—there are a few of them), but I'm reluctant to design and implement them without an actual need. Turning off the index for this issue is more of a workaround for not having |
I found the problem while trying to create a sandbox wrapper over an existing command line tool ( Perhaps my specific use case is a bit obscure, but I think the idea of being able to use overlays in drop-in tool wrappers that are used from scripts is something that ought to work.
I agree that creating a new issue/PR is the right approach, ultimately the cause of the problem I ran into is not related to this PR, but rather a more general issue with the exit/cleanup code and locked resources, and the conditions to run into it are pretty specific.
Agreed, I think bwrap should strive for simplicity and turning off the overlayfs index is just a workaround. If someone needs to tune their mounts for some specific use case, this can be done like in #412 (comment). |
23ff0f8
to
31d8b9e
Compare
A few nits but this looks great! Just from a simplicity perspective I think it would be easier to follow what's going on if the ops were added directly and then processed in The other big thing is how a user like flatpak should test for availability of this feature. How is this done with other features? |
@smcv, in #596 (comment) you said ‘solving its remaining issues’. As far as I'm aware the only outstanding issue here is your decision on whether the design choices we discussed in #547 (comment) are acceptable. Is there anything else for anyone to do? Let me know and I'll be happy to work on it. |
FWIW, the order as implemented right now makes a lot of sense when you build the command line for it from a program like flatpak. |
I have a flatpak branch now which is basically doing Letting bwrap chown the directory doesn't work and fails with EINVAL. For me
or just bind mounting it without the tmp overlay like this:
In both cases $masking-dir contains a the file Unfortunately the masking dir layer is on the oldroot. What I want, really is a way to add a layer from the newroot:
This also works for bind mounting:
Long story short, I think an option to create an overlayfs layer from a directory from the newroot would be really helpful. |
Seems like the regression is getting fixed after all. https://lore.kernel.org/linux-unionfs/[email protected]/ |
Good news in the long-ish term (whatever window of time, if any, exists between ‘6.5 is too old for bwrap to care about supporting’ and ‘bwrap is ported to use the new mount API, in order to support filesystem user mapping or whatever’), but unfortunately that doesn't change anything for this PR. |
6.5 is supposedly getting fixed as well. If the fix lands in 6.5 it should be fine to go ahead with your original escaping code. |
I'm running into this issue: |
@rhendric it landed in 6.5.8. Revert the escaping changes and maybe pull in my two commits and then try for another round of review? |
The issue @patlefort mentions appears to be due to https://git.kernel.org/pub/scm/linux/kernel/git/stable/linux.git/commit/?h=linux-6.5.y&id=56f16bda27aabc5be062e80d20fe78c2417f392a (included in Linux ≥6.5.8). So it looks like the branch in |
@patlefort, I'll be reverting most of the changes in the second commit of this PR after I have a Linux 6.5.8 VM on which to test. Until then, if you use just 2b1f37e, you shouldn't see that issue. |
It works unless I use |
‘Overlapping overlay layers are not supported and can cause unexpected behavior’ So using |
Figuring out when two layers are overlapping is a hard problem that adds a ton of complexity. There are also lots of other restrictions with overlayfs. In general bwrap doesn't try to validate that the kernel op won't fail but rather just tries and if something goes wrong aborts. |
By ‘this case’ I meant someone literally providing |
Why this error exactly and not any others? This is a low-level tool not meant to be used by end-users. |
<shrug> Being a low-level tool doesn't mean paying zero attention to usability. I don't know whether this project would want to add a few lines of code to cover this particular gotcha that came up in testing, which is why I'm asking the question of whether it's worth it and not just pushing code that does it. |
It did take me by surprise so maybe adding a word about it in the manual could help. |
6d85ee1
to
6dddda5
Compare
We should probably set the $ mkdir lower upper work merged
$ mkdir lower/directory
$ bwrap --dev-bind / / --overlay-src "$PWD"/lower/ --overlay "$PWD"/{upper,work,merged}/ \
sh -c 'rmdir merged/directory && mkdir merged/directory'
mkdir: cannot create directory 'merged/directory': Input/output error Podman also does it as well for unprivileged/rootless containers, see: https://github.com/containers/storage/blob/a0eeb0781129973c70d2cb101b00e312631a5702/drivers/overlay/overlay.go#L1723 |
6dddda5
to
06bf460
Compare
I've rebased onto the 0.9.0 release and implemented the @smcv, any chance this can get your eyes again? |
Sorry, this got preempted by a security vulnerability in Flatpak which required a new bubblewrap feature to be added under embargo. Please could you rebase onto 0.10.0? Is this otherwise in a ready-to-review state, or are there implementation concerns remaining? Please mark the threads that various people have raised as "Resolved", if you believe those concerns have been addressed (or if you can't do that because you aren't a maintainer, the next best thing is to reply to the thread and say "Resolved"). |
Originally I was trying to ensure some level of isolation, but now I realize that's contrary to my original goal: a clean $HOME. Given that, I have changed the bwrap to bind all relevant mounts from root and the user home. In order to prevent propagation of the managed configs, directories are recursively bind mounted, which at the moment is incurring a performance penalty, but may be improved if containers/bubblewrap#547 gets merged.
This commit adds --overlay, --tmp-overlay, --ro-overlay, and --overlay-src options to enable bubblewrap to create overlay mounts. These options are only permitted when bubblewrap is not installed setuid. Resolves: containers#412 Co-authored-by: William Manley <[email protected]> Signed-off-by: Ryan Hendrickson <[email protected]>
Signed-off-by: Ryan Hendrickson <[email protected]>
Signed-off-by: Ryan Hendrickson <[email protected]>
Signed-off-by: Ryan Hendrickson <[email protected]>
Signed-off-by: Ryan Hendrickson <[email protected]>
15eb0b5
to
3f10549
Compare
Rebased and ready for review. |
This commit adds --overlay, --tmp-overlay, --ro-overlay, and --overlay-src options to enable bubblewrap to create overlay mounts. These options are only permitted when bubblewrap is not installed setuid.
This is a continuation/partial rewrite of #167, addressing the feedback given there.
Other improvements of note:
--tmp-overlay
is a new contribution for a use case I frequently have: I want to mount an overlay that is writable but I don't want to persist the writes between runs.