-
Notifications
You must be signed in to change notification settings - Fork 132
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Building on local image fails with error mounting tmpfs
#2244
Comments
tmpfs
tmpfs
I see this on Ubuntu 20.04 and apptainer-suid 1.3 too, trying to build a sandbox version from an existing sif file (both as a normal user and as root): root@eespot-hpchb176rsv4-1:/shared/home/grma# apptainer --version root@eespot-hpchb176rsv4-1:/shared/home/grma# apptainer build --sandbox wps_wrf_4p4p2_py3_latest_azure ./wrf_4p4p2_py3_latest.sif |
I note that this process works on AlmaLinux 8.7 with a slightly newer apptainer-suid package: [grma@eespot-hpchb120rsv3-1 ~]$ rpm -qa | grep apptainer [grma@eespot-hpchb120rsv3-1 ~]$ apptainer build --sandbox wps_wrf_4p4p2_py3_latest_azure ./wrf_4p4p2_py3_latest.sif Is it possible to get this later package distributed to the Ubunta PPA Repo for download and testing? |
@khaiyichin I have not been able to reproduce the problem on my Ubuntu 20.04 VM. What type of filesystem is being used for temporary space? Make sure it is local disk, as many network filesystems don't work properly. I permanently set @garymansellricardo What makes you think that the AlmaLinux 8.7 package is slightly newer than the Ubuntu package? Aren't they both 1.3.1? I think your issue is most likely related to the /shared filesystem you're trying to write into, perhaps particularly when running as root. I note that also in the second case you're not running as root. That should be fine, especially if you have /etc/subuid mapping set up, but it is different. |
@DrDaveD - you are probably correct, but please see the two slightly differing apptainer --version outputs for the two OS's below. I don't think it is anything to do with my /shared nfs filesystem (where my user home dir and bind mount dirs are located) as I am not binding any folders at this point, just extracting the sif container file to a local filesystem. But I will run both tests again as root (who has a local home dir to the node) and copy the sif container to a local scratch disk on the node before extracting which (I think?) will remove this /shared nfs dir from the mix. See below, I have run the same operation on both Ubuntu 20.04 (which fails) and AlmaLinux 8 (which works). I have included the apptainer version output and mounted filesystem types output too for each node, and am running from /scratch which is a local md (2x nvme disks in a stripe) device with xfs filesystem on both nodes. Ubuntu 20.04.6 LTS:
AlmaLinux 8.7:
|
This also works OK on Ubuntu 22.04 - seems to be just Ubuntu 20.04 that is the problem:
|
I have gone back to the problem Ubuntu 20.04 VM and have explicitly set the APPTAINER_TMPDIR to a large, local scratch folder and still get the error:
|
Thanks for narrowing it down. That extra version information on el8 is just packaging information, not really relevant. I think it must be something specific about your Ubuntu 20.04 host, because it doesn't happen for me. You don't happen to both be using the same host, or are you? We haven't seen this exact error before, although we have experienced a case where configuration for other container systems using shared code has caused problems, in #758. That's the only idea I have. |
I am using the Microsoft Azure Ubuntu 20.04 HPC VM SKU, so it would surprise me if there is something wrong with that image but not the 22.04 image. As I expect they use a similar recipe for building both - they just install additional HPC software packages such as MPI, Nvidia CUDA stuff etc. @khaiyichin - what Ubuntu 20.04 image are you using? |
@DrDaveD - I really need to be able to extract containers to a sandbox on Ubuntu 20.04, is there anything I can do to get better logs, or further investigate what the problem could be? |
First, make sure it happens with even simple images like the one defined in the description of this PR. Next, if I had root access on the system, I would try using strace to narrow down exactly what it is that apptainer is losing a connection with. That can be very tricky because apptainer ends up calling other processes and strace sometimes gets stuck. Sometimes I have had to replace the starter for example with a script or a binary that invokes strace before invoking the real starter. With a build it ends up re-invoking apptainer with a bunch of parameters and I think it's that apptainer that is experiencing the error, so inserting strace can be even trickier, quite likely requiring compiling a modified version of apptainer. You can see what it is doing with If you could give me temporary root access to a system where this can be reproduced I could do the investigation for you. |
Sorry for the late reply! I'm not too familiar with a lot of the system-level stuff, but I'm getting this on my Ubuntu 20.04 laptop, so to answer your question @DrDaveD I'm pretty sure the temporary filespace |
@garymansellricardo Can you try some of the suggestions I gave? |
@DrDaveD - Sorry, have been on holida and had some pressing work projects... I can't seem to re-create the issue now on the (Azure) Microsoft HPC Ubuntu 20.04 marketplace image that I was seeing the issue with before, but I notice that the version now in the repo is 1.3.2 (before 1.3.1) - so perhaps something was done to fix this in the new release? @khaiyichin - can you perhaps try the 1.3.2 version and confirm it is fixed for you? |
Version of Apptainer
Expected behavior
Should generate a new image that's based on the first one.
Actual behavior
I built the first image without any issues with
sudo
, but building the second (that is based on the first) fails withI also ran it with
--debug
(see below).Steps to reproduce this behavior
It could be that this is an issue related to my machine; what I did was build
one.def
and thentwo.def
withsudo apptainer build one.sif one.def; sudo apptainer build two.sif two.def
What OS/distro are you running
How did you install Apptainer
sudo apt install apptainer
Actual behavior when run with
--debug
The text was updated successfully, but these errors were encountered: