Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

SIGTERM isn't propegated or respected, making docker stop slow and then forceful #771

Closed
consideRatio opened this issue Mar 14, 2024 · 10 comments · Fixed by #773
Closed

Comments

@consideRatio
Copy link
Contributor

consideRatio commented Mar 14, 2024

Issue

docker stop and kubectl delete etc will pass a SIGTERM signal to kindly ask that it self-terminates, and after a while (10 seconds for docker, 30 seconds for kubectl), it will pass SIGKILL forcefully terminating everything. What often can happen is that a SIGTERM signal is only passed to process id 1 in the container, but then not propegated onwards to other processes that otherwise would terminate as long as they received the signal.

# is the container able to terminate on SIGTERM in less than 5 seconds
# or is it going to be killed forcefully after 10 seconds as is the default for
# docker to wait?
container_id=$(docker run -d rocker/binder)
timeout 5 docker stop $container_id && echo 'Stopped in time' || echo 'SIGTERM likely not respected'

A resolution

By using an entrypoint called tini, we can resolve this.

FROM rocker/binder:latest

USER root
RUN apt-get update \
 && apt-get install -y --no-install-recommends \
        tini
USER ${NB_USER}

ENTRYPOINT ["tini", "--"]
# test of the Dockerfile above to conclude it can stop within a second
docker build . -t rocker/binder:tini

container_id=$(docker run -d rocker/binder:tini)
timeout 1 docker stop $container_id && echo 'Stopped in time' || echo 'SIGTERM likely not respected'
3362bb73d6610efc08fe89c72e46339f9045c9aa929b92dea49e8d62961c1381
Stopped in time
@cboettig
Copy link
Member

Nice, thanks. Do I follow correctly that this issue just impacts the binder-based images running the default command

CMD ["/bin/sh", "-c", "jupyter lab --ip 0.0.0.0 --no-browser"]

Would you suggest just changing this entrypoint for the binder-based images? Or is this unrelated to the default CMD that is being used to launch the instance? Is tini being used as the entrypoint in, say, pangeo/jupyter stacks?

You mention both kubectl delete and docker stop -- speaking of runtimes, what about singularity? (I believe we have a sizable number of singularity users and init issues can crop up there too, perhaps tini could be helpful).

As you probably saw, most of the other images in this stack already use a lightweight init manager, s6-init, by default. tini looks intriguing, especially in this mode of running as a normal user and only needing to alter the default ENTRYPOINT, not the default CMD. It's been a long time since we evaluated init managers and I haven't kept up on the comparisons but would welcome discussion!

Thanks much.

@consideRatio
Copy link
Contributor Author

I think the issue is for all images in this repo, but I've not verified it. By declaring ENTRYPOINT next to CMD, they combine to the final command to execute where entrypoint comes first - if not specified, it just defaults to a shell running the command in CMD i think.

The issue is probably the same for most container runtimes, at least those defaulting entrypoint to a shell.

I didn't see that a6-init was used, and have not heard about it before. If something like that is already used, it should be declared in the entrypoint i think, otherwise a shell will be the entrypoint and fail to propegate signals to the thing provided in CMD i think.

Got to run back to taking care of our little baby!

@cboettig
Copy link
Member

Thanks @consideRatio . Is tini used in other projects you are familiar with, e.g. in the jupyterhub system? I think we should definitely consider revisiting this:

  • switching our /init from CMD to ENTRYPOINT (either with s6 or tini)
  • and switching from s6-overlay to tini

s6-init has always been a source of friction. We adopted maybe almost a decade ago when it was afaik the only init manager that played nicely with Docker (and there were many long blog posts about the problems of using either of the more standard init managers distributed in Ubuntu with docker). s6-overlay install has always been a bit more convoluted than a simple apt-get, and because it's not coming from official ubuntu repos it means we have to pin version with an env var, and it's always been a bit complex to configure. So I think there's good reasons for us to revisit this and maybe switching to tini.

That said, this could easily be a breaking change, and I think the issue is specific to binder image, so for now it probably makes more sense to address it there.

Note that s6-init is used in all containers in this image other than the base image, r-ver, and the binder image, though it is declared as the default CMD https://github.com/rocker-org/rocker-versioned2/blob/master/dockerfiles/rstudio_4.3.3.Dockerfile#L20 , and that s6 should be respecting this correctly:

container_id=$(docker run -d rocker/rstudio)
timeout 5 docker stop $container_id && echo 'Stopped in time' || echo 'SIGTERM likely not respected'
Stopped in time

The binder image overrides the default command, and thus bypasses S6-init. r-ver doesn't have this issue as it runs the default interactive R shell and not a service.

container_id=$(docker run -d rocker/r-ver)
timeout 5 docker stop $container_id && echo 'Stopped in time' || echo 'SIGTERM likely not respected'

Nevertheless, we've fallen behind the times here -- Scanning the docs, it looks like it now supports / recommends using ENTRYPOINT as well. So I do think it makes sense to migrate away from the s6-overlay mechanism to something simpler and more standard, but probably wait for the next major lift. Meanwhile maybe we can consider a simpler fix for binder image alone? @consideRatio 's solution looks reasonable to me. Maybe you can send a PR?

@consideRatio
Copy link
Contributor Author

Is tini used in other projects you are familiar with, e.g. in the jupyterhub system?

Yes quite systematically across jupyterhub project's providing Dockerfiles, either tini or no no init system. It has only been tini used so far in the JupyterHub org. I lack experience to compare, but we have only used a very trivial entrypoint compared to for example systemd or similar to start up many things, our goal has mostly been to get small features tini provides, where I don't understand the details of them all.

  • switching our /init from CMD to ENTRYPOINT (either with s6 or tini)
  • and switching from s6-overlay to tini

I'm not confident on how things get started from running /init, which seems to be setup via a6-overlay being isntalled, but I think if possible its nice to separate the init process from what we may want to start within the container. For example in the JupyterHub chart's hub image that starts JupyterHub itself, we have it like this:

ENTRYPOINT ["tini", "--"]
CMD ["jupyterhub", "--config", "/usr/local/etc/jupyterhub/jupyterhub_config.py"]

Since only /init is called, I don't understand how to transition to tini, as I then assume a6-overlay is similar to systemd that I presume things are started up by being registered to a6-overlay etc, and that makes it hard for me to contribute with a migration to tini if you want to use tini.

I opened #773, but I think I'd like to settle with that as I lack knowledge about a6-overlay and isn't motivated enough given time my contrainsts to learn about it at the moment.

@consideRatio
Copy link
Contributor Author

Thank you @cboettig for your quick feedback and discussion about this, and your work in this project in general!!!

@benz0li
Copy link
Contributor

benz0li commented Mar 16, 2024

A good overview: https://ahmet.im/blog/minimal-init-process-for-containers/

@cboettig
Copy link
Member

Thanks @benz0li , that's a nice writeup. Sounds like s6-overlay is still the favorite option for handling more than one child process (which I know at least some users take advantage of, though I agree it's a bit of an edge case. yes the blog shows a hack around that with wait -n but iiuc that no longer meets the exit gracefully issue here).

It's notable to me that docker already supports drop-in tini via --init, but I imagine that's not true for other runtimes that might be used (e.g. by k8s)?

container_id=$(docker run -d  --init rocker/binder)
timeout 5 docker stop $container_id && echo 'Stopped in time' || echo 'SIGTERM likely not respected'
Stopped in time

I think we should try to move to tini entrypoint in our next overhaul of the build logic. we can keep the s6 install script in scripts dir as an opt-in fallback.

@benz0li
Copy link
Contributor

benz0li commented Mar 18, 2024

I think we should try to move to tini entrypoint in our next overhaul of the build logic. we can keep the s6 install script in scripts dir as an opt-in fallback.

@cboettig Then, you need to adapt

## Set our dynamic variables in Renviron.site to be reflected by RStudio Server or Shiny Server
exclude_vars="HOME PASSWORD RSTUDIO_VERSION BATCH_USER_CREATION"
for file in /var/run/s6/container_environment/*; do
sed -i "/^${file##*/}=/d" "${R_HOME}/etc/Renviron.site"
regex="(^| )${file##*/}($| )"
[[ ! $exclude_vars =~ $regex ]] && echo "${file##*/}=$(cat "${file}")" >>"${R_HOME}/etc/Renviron.site" || echo "skipping ${file}"
done

to something like

## Set our dynamic variables in Renviron.site to be reflected by RStudio Server or Shiny Server
exclude_vars="BATCH_USER_CREATION HOME OLDPWD PASSWORD PWD RSTUDIO_VERSION SHLVL"
for var in $(compgen -e); do
  [[ ! $exclude_vars =~ $var ]] && echo "$var=${!var}" >>"$(R RHOME)/etc/Renviron.site" || echo "skipping ${var}" \
done

IMHO LD_LIBRARY_PATH and PATH should also be part of exclude_vars.

Cross reference: jupyterhub/jupyter-rsession-proxy#145

@cboettig
Copy link
Member

@benz0li oh interesting. Yeah I was wondering how we ensure init scripts are run if we drop S6.

I'm afraid this went a bit over my head though -- how do we have an init script that executes at runtime before bringing up RStudio? e.g. we'd no longer have these, right?:

# install user config initiation script
cp /rocker_scripts/init_set_env.sh /etc/cont-init.d/01_set_env
cp /rocker_scripts/init_userconf.sh /etc/cont-init.d/02_userconf
cp /rocker_scripts/pam-helper.sh /usr/lib/rstudio-server/bin/pam-helper

@benz0li
Copy link
Contributor

benz0li commented Mar 18, 2024

cboettig added a commit that referenced this issue Mar 29, 2024
Fixes #771, not by introducing `tini`, but by ensuring we don't let `sh`
be the init process which in turn runs `jupyter lab` without propegating
SIGTERM to it.

There may be a point to introduce `tini` or similar, but from the
SIGTERM perspective, this does the trick as well.

This PR is modelled on for example
https://github.com/rocker-org/rocker-versioned2/pull/740/files where I
see that all Dockerfiles are updated along with a repsective .json file.
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging a pull request may close this issue.

3 participants