Skip to content

Commit 3780a7e

Browse files
committed
add initial build guide
1 parent 5d4f6fa commit 3780a7e

File tree

2 files changed

+335
-0
lines changed

2 files changed

+335
-0
lines changed

README.md

Lines changed: 4 additions & 0 deletions
Original file line numberDiff line numberDiff line change
@@ -3,3 +3,7 @@
33
_WIP_
44

55
Sample configurations for the TRE Container Execution Service (CES).
6+
7+
## Documentation
8+
9+
- [docs/container-build-guide.md](Container Build Guide)

docs/container-build-guide.md

Lines changed: 331 additions & 0 deletions
Original file line numberDiff line numberDiff line change
@@ -0,0 +1,331 @@
1+
# Container Image Build Guide
2+
3+
_WIP_
4+
5+
This document outlines best practices for building and publishing software container images which are ready for the TRE Container Execution Service.
6+
7+
We assume that you are already familiar with Docker concepts and have some experience in building your own images. If you are new to Docker, then the following resources would be a good starting point:
8+
- https://carpentries-incubator.github.io/docker-introduction/
9+
- https://docs.docker.com/get-started/overview/
10+
- https://docker-curriculum.com/
11+
12+
It is also assumed that software development best practices are followed, such as version control using Git(Hub) and Continuous Integration (CI). It is also recommended to enable a tool such as [Dependabot](https://docs.github.com/en/code-security/getting-started/dependabot-quickstart-guide) to receive alerts and automated Pull Requests for dependencies.
13+
14+
This guide is intended to provide an overview of building images appropriate for the TRE, and not a full end-to-end explanation for packaging existing analysis code into a container. A particular focus is placed on:
15+
- Optimising the build process to reduce the content of an image, the build time, and the final image size,
16+
- Ensuring common mistakes are highlighted through the use of code linting tools,
17+
- Automating the image publishing process using the GitHub Actions (GHA) CI service.
18+
19+
## Contents
20+
21+
- [1. Writing a `Dockerfile`](#1-writing-a-dockerfile)
22+
- [1.1. Checklist for writing `Dockerfile`s](#11-checklist-for-writing-dockerfiles)
23+
- [1.2. `Dockerfile` Example](#12-dockerfile-example)
24+
- [2. Building Locally](#2-building-locally)
25+
- [2.1. Checklist for building Docker images](#21-checklist-for-building-docker-images)
26+
- [2.2. Local Docker Build Example](#22-local-docker-build-example)
27+
- [2.3. pre-commit](#23-pre-commit)
28+
- [3. Building in CI](#3-building-in-ci)
29+
- [4. Publishing in CI](#4-publishing-in-ci)
30+
- [5. References](#5-references)
31+
- [0. Development Notes](#0-development-notes)
32+
33+
## 1. Writing a `Dockerfile`
34+
35+
We have compiled a checklist for Dockerfile creation using these resources as a basis:
36+
- https://docs.docker.com/develop/develop-images/dockerfile_best-practices/
37+
- https://sysdig.com/blog/dockerfile-best-practices/
38+
39+
### 1.1. Checklist for writing `Dockerfile`s
40+
41+
- [ ] Images in all `FROM` statements are fully-qualified and pinned
42+
- **Rationale**: Docker images can be hosted on multiple repositories, and image tags are mutable. The only way to ensure reproducible builds is by pinning images by their full signature. The signature of an image can be viewed with `docker inspect --format='{{index .RepoDigests 0}}' <image>`. The image repository is usually `docker.io` or `ghcr.io`.
43+
- **Example**:
44+
```dockerfile
45+
# Incorrect
46+
FROM nvidia/cuda:latest
47+
FROM docker.io/nvidia/cuda:12.4.1-cudnn-devel-ubuntu22.04
48+
FROM docker.io/nvidia/cuda:latest
49+
50+
# Correct
51+
FROM docker.io/nvidia/cuda:12.4.1-cudnn-devel-ubuntu22.04@sha256:622e78a1d02c0f90ed900e3985d6c975d8e2dc9ee5e61643aed587dcf9129f42
52+
```
53+
- [ ] Consecutive and related commands are grouped into a single `RUN` statement
54+
- **Rationale**: Each `RUN` statement causes a new layer to be created in the image. By grouping `RUN` statements together, and deleting temporary files, the final image image size can be greatly reduced.
55+
- **Example**:
56+
```dockerfile
57+
# Incorrect
58+
RUN apt-get -y update
59+
RUN apt-get -y install curl
60+
RUN apt-get -y install git
61+
62+
# Correct
63+
# - Single RUN statement with commands broken over multiple lines
64+
# - Temporary apt files are deleted and not stored in the final image
65+
RUN : \
66+
&& apt-get update -qq \
67+
&& DEBIAN_FRONTEND=noninteractive apt-get install \
68+
-qq -y --no-install-recommends \
69+
curl \
70+
git \
71+
&& apt-get clean \
72+
&& rm -rf /var/lib/apt/lists/* \
73+
&& :
74+
```
75+
- [ ] Multi-stage builds are used where appropriate
76+
- **Rationale**: Separating build/compilation steps into a separate stage helps to minimise the content of the final image and reduce the overall size
77+
- **Example**:
78+
```dockerfile
79+
FROM some-image AS builder
80+
RUN apt update && apt -y install build-dependencies
81+
COPY . .
82+
RUN ./configure --prefix=/opt/app && make && make install
83+
84+
FROM some-minimal-image
85+
RUN apt update && apt -y install runtime-dependencies
86+
COPY --from=builder /opt/app /opt/app
87+
```
88+
- [ ] A non-root `USER` without a specific UID is defied
89+
- **Rationale**: By default, `RUN` commands, and the command set as the `ENTRYPOINT` of the container runs as the `root` user. It is best practice to define an unprivileged user with limited scope.
90+
- **Example**:
91+
```dockerfile
92+
RUN groupadd --system nonroot && useradd --no-log-init --system --gid nonroot nonroot
93+
USER nonroot
94+
ENTRYPOINT ["python", "app.py"]
95+
```
96+
- [ ] Executables are owned by `root` and are not writable
97+
- **Rationale**: Executables in the container should not be modifiable at runtime. Running as a non-root user and making executables owned by root helps ensure containers are immutable at runtime. Explicitly using `--chown` and `--chmod` may not be necessary depending on how the executable has been built.
98+
- **Example**:
99+
```dockerfile
100+
COPY --from builder --chown root:root --chmod=555 app.py app.py
101+
```
102+
- [ ] A minimal image is used in the last stage to reduce the final image size
103+
- **Rationale**: When using multi-stage builds, the final `FROM` image should use a minimal image such as from [Distroless](https://github.com/GoogleContainerTools/distroless/) or [Chainguard](https://images.chainguard.dev/) to minimise image content and size
104+
- **Example**:
105+
```dockerfile
106+
FROM some-image AS builder
107+
# ...
108+
FROM gcr.io/distroless/base-debian12
109+
# ...
110+
```
111+
- [ ] `COPY` is used instead of `ADD`
112+
- **Rationale**: Compared to the `COPY` command, `ADD` supports much more functionality such as unpacking archives and downloading from URLs. While this may seem convenient, using `ADD` may result in much larger images with layers which are unnecessary
113+
- **Example**:
114+
```dockerfile
115+
# Incorrect
116+
ADD https://example.com/some.tar.gz /
117+
RUN tar -x -C /src -f some.tar.gz && ...
118+
119+
# Correct
120+
RUN curl https://example.com/some.tar.gz | tar -xC /src && ...
121+
```
122+
- [ ] No data files are copied into the image
123+
- **Rationale**: As a general rule, images should only contain software and configuration files. Any data files required will be presented to the container at runtime (e.g., via the `/safe_data` mount) and should not be copied into the container during the build
124+
- **Example**: N/A
125+
126+
### 1.2. `Dockerfile` Example
127+
128+
This example applies the above checklist to create a `Dockerfile` for a Python application which includes pytorch and CUDA dependencies. It assumes that the application code lives in `src/app.py` and requirements are in `requirements.txt`.
129+
130+
The multi-stage build results in a final image size of around 6.2 GB vs 11.2 GB as a single stage.
131+
132+
```dockerfile
133+
# syntax=docker/dockerfile:1
134+
135+
ARG CONDA_DIR="/opt/conda"
136+
137+
# Build stage
138+
FROM docker.io/nvidia/cuda:12.4.1-cudnn-devel-ubuntu22.04@sha256:622e78a1d02c0f90ed900e3985d6c975d8e2dc9ee5e61643aed587dcf9129f42 AS builder
139+
140+
ARG CONDA_DIR
141+
ARG PYTHON_VERSION="3.10"
142+
143+
ENV PATH="${CONDA_DIR}/bin:${PATH}"
144+
145+
RUN : \
146+
&& apt-get update -qq \
147+
&& DEBIAN_FRONTEND=noninteractive apt-get install \
148+
-qq -y --no-install-recommends \
149+
build-essential \
150+
bzip2 \
151+
ca-certificates \
152+
git \
153+
unzip \
154+
wget \
155+
&& apt-get clean \
156+
&& rm -rf /var/lib/apt/lists/* \
157+
&& :
158+
159+
COPY requirements.txt /tmp/
160+
RUN : \
161+
&& set -eu \
162+
&& wget --quiet "https://repo.continuum.io/miniconda/Miniconda3-py310_24.4.0-0-Linux-x86_64.sh" -O /tmp/miniconda.sh \
163+
&& /bin/bash /tmp/miniconda.sh -b -p "${CONDA_DIR}" \
164+
&& conda install -y python="${PYTHON_VERSION}" \
165+
&& conda install -y -c pytorch -- pytorch torchvision cudatoolkit=10.0 \
166+
&& conda install -y --file /tmp/requirements.txt \
167+
&& conda clean --all \
168+
&& rm /tmp/miniconda.sh /tmp/requirements.txt \
169+
&& :
170+
171+
# App stage
172+
FROM docker.io/nvidia/cuda:12.4.1-cudnn-runtime-ubuntu22.04@sha256:2fcc4280646484290cc50dce5e65f388dd04352b07cbe89a635703bd1f9aedb6
173+
174+
ARG CONDA_DIR
175+
ENV PATH="${CONDA_DIR}/bin:${PATH}"
176+
177+
COPY --from=builder /opt/conda /opt/conda
178+
179+
WORKDIR /src
180+
COPY --chmod=444 app.py app.py
181+
182+
RUN groupadd --system nonroot && useradd --no-log-init --system --gid nonroot nonroot
183+
USER nonroot
184+
185+
ENTRYPOINT ["./app.py"]
186+
```
187+
188+
## 2. Building Locally
189+
190+
### 2.1. Checklist for building Docker images
191+
192+
- [ ] A linter such as [Hadolint](https://github.com/hadolint/hadolint) is used to verify `Dockerfile` content
193+
- **Rationale**: Automated code linting tools can be very useful in detecting common mistakes and pitfalls when developing software. Some configuration tweaks may be required however, as shown in the example below.
194+
- **Example**:
195+
```console
196+
# Ignore DL3008 (Pin versions in apt get install)
197+
docker run --pull always --rm -i docker.io/hadolint/hadolint:latest hadolint --ignore DL3008 - < Dockerfile
198+
```
199+
- [ ] A temporary directory is used for the build context
200+
- **Rationale**: Using a temporary directory for the build context avoids unwanted files accidentally being copied into the image. The context is also copied during the build process, so will be slower if large files are included. A `.dockerignore` file can also be used to exclude certain files or file extensions.
201+
- **Example**:
202+
```console
203+
build_ctx=$(mktemp -d)
204+
cp file... "${build_ctx}"
205+
docker build --file Dockerfile "${build_ctx}"
206+
rm -r "${build_ctx}"
207+
```
208+
- [ ] The image is saved with a unique, descriptive tag
209+
- **Rationale**: While it is useful to define a `latest` tag, each production image should also be tagged with a label such as the version or build date. For non-local images, the registry and repository should also be included. Images can also be tagged multiple times.
210+
- **Example**:
211+
```console
212+
docker build \
213+
--tag ghcr.io/my/image:v1.2.3 \
214+
--tag ghcr.io/my/image:latest \
215+
...
216+
```
217+
218+
### 2.2. Local Docker Build Example
219+
220+
```console
221+
docker run --pull always --rm -i docker.io/hadolint/hadolint:latest hadolint --ignore DL3008 - < Dockerfile
222+
223+
# Create a temporary directory for the build context
224+
build_ctx=$(mktemp -d)
225+
226+
# Copy only the needed files to build_ctx
227+
cp src/app.py requirements.txt "${build_ctx}"
228+
229+
# Build the image
230+
docker build \
231+
-f Dockerfile \
232+
--tag ghcr.io/my/container:v1.2.3 \
233+
--tag ghcr.io/my/container:latest \
234+
"${build_ctx}"
235+
236+
# Delete the tmporary directory
237+
rm -r "${build_ctx}"
238+
```
239+
240+
### 2.3. pre-commit
241+
242+
Using the [`pre-commit`](https://pre-commit.com/) tool, it is possible to configure your local repository so that Hadolint (and similar tools) are run automatically each time `git commit` is run. This is recommended to ensure linting and auto-formatting tools are always run before code is pushed to GitHub.
243+
244+
To run Hadolint, include the hook in your `.pre-commit-config.yaml` file:
245+
246+
```yaml
247+
repos:
248+
- repo: https://github.com/hadolint/hadolint
249+
rev: v2.12.0
250+
hooks:
251+
- id: hadolint-docker
252+
```
253+
254+
## 3. Building in CI
255+
256+
Below is a sample GHA configuration which runs Hadolint, builds a container named `ghcr.io/my/repo`, then runs the [Trivy](https://aquasecurity.github.io/trivy) container scanning tool. The Trivy [SBOM](https://www.cisa.gov/sbom) report is then uploaded as a job artifact.
257+
258+
This assumes:
259+
- The repo contains a `Dockerfile` in the top-level directory,
260+
- The `Dockerfile` contains an `ARG` or `ENV` variable which defines the version of the packaged software.
261+
262+
```yaml
263+
# File .github/workflows/main.yaml
264+
name: main
265+
on:
266+
push:
267+
defaults:
268+
run:
269+
shell: bash
270+
jobs:
271+
build:
272+
runs-on: ubuntu-22.04
273+
steps:
274+
- name: checkout
275+
uses: actions/checkout@v4
276+
- name: run hadolint
277+
run: docker run --rm -i ghcr.io/hadolint/hadolint < Dockerfile
278+
- name: build image
279+
run: |
280+
set -euxo pipefail
281+
repository="ghcr.io/my/repo"
282+
version="$(grep _VERSION= Dockerfile | cut -d'"' -f2)"
283+
image="${repository}:${version}"
284+
docker build . --tag "${image}"
285+
echo "image=${image}" >> "$GITHUB_ENV"
286+
echo "Built ${image}"
287+
- name: run trivy
288+
uses: aquasecurity/trivy-action@master
289+
with:
290+
image-ref: "${{ env.image }}"
291+
format: 'github'
292+
output: 'dependency-results.sbom.json'
293+
github-pat: "${{ secrets.GITHUB_TOKEN }}"
294+
severity: 'MEDIUM,CRITICAL,HIGH'
295+
scanners: "vuln"
296+
- name: upload trivy report
297+
uses: actions/upload-artifact@v4
298+
with:
299+
name: 'trivy-sbom-report'
300+
path: 'dependency-results.sbom.json'
301+
```
302+
303+
Note that manually running Hadolint via pre-commit can be skipped if you are using pre-commit and the [pre-commit.ci](https://pre-commit.ci/) service.
304+
305+
## 4. Publishing in CI
306+
307+
__Note__ Images can also be built and pushed from your local environment as normal.
308+
309+
Once the stage has been reached where your software package is ready for distribution, the GHA example above can be extended to automatically publish new image versions to the GitHub Container Registry (GHCR). An introduction to GHCR can be found in the GitHub docs [here](https://docs.github.com/en/packages/quickstart).
310+
311+
```yaml
312+
# After the image has been built and scanned
313+
- name: push image
314+
run: |
315+
set -euxo pipefail
316+
echo "${{ secrets.GITHUB_TOKEN }}" | docker login ghcr.io -u $ --password-stdin
317+
docker push "${image}"
318+
```
319+
320+
## 5. References
321+
322+
- https://sysdig.com/blog/image-scanning-best-practices/
323+
- https://medium.com/the-artificial-impostor/smaller-docker-image-using-multi-stage-build-cb462e349968
324+
325+
## 0. Development Notes
326+
327+
This document could be expanded with guidance on:
328+
329+
- Building for x64 vs ARM
330+
- Using https://github.com/docker/build-push-action for caches
331+
- Optional testing as part of build

0 commit comments

Comments
 (0)