-
Notifications
You must be signed in to change notification settings - Fork 0
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Fix BASE_IMAGE Issue and Optimise Docker Build Time to 8.3m with Precompilation #47
Conversation
- Added explicit installation and precompilation of MKL_jll in the Dockerfile to resolve warnings about missing precompiled modules during container startup. - Precompiled MKL_jll and IntelOpenMP artifacts during the build phase, significantly reducing precompilation time from 18 minutes to 9 minutes by avoiding on-the-fly precompilation. - The time reduction is attributed to precompiling MKL_jll in advance, preventing the heavy processing load that would otherwise occur during runtime. - Added gdal-bin, libgdal-dev, and libfftw3-dev to the Dockerfile to support system dependencies required for geospatial and scientific computations.
Dockerfile
Outdated
COPY ./Project.toml ./Project.toml | ||
COPY ./Manifest.toml ./Manifest.toml | ||
RUN julia --project=@reefguide -e 'using Pkg; Pkg.instantiate(verbose=true)' | ||
# Pre-download and cache MKL_jll dependency before general precompilation | ||
RUN julia -e 'using Pkg; Pkg.add("MKL_jll"); Pkg.instantiate(); using MKL_jll' |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Not sure which project is being instantiated here.
Am I correct in assuming the build is faster to complete because the step to precompile MKL and others is essentially skipped in subsequent builds?
Additionally, are the two copy statements above necessary? They seem to simply copy the current files to their current locations?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
@ConnectedSystems Yes, subsequent builds are faster because the MKL_jll and other dependencies are precompiled and cached, which prevents the need for recompilation at runtime.
Regarding the COPY statements: yes, they are needed to ensure Project.toml and Manifest.toml are available for Julia to install and precompile dependencies during Pkg.instantiate()
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Regarding the COPY statements: yes, they are needed to ensure Project.toml and Manifest.toml are available for Julia to install and precompile dependencies during Pkg.instantiate()
Yes, I'm asking for clarification as it looks like it is redundant (copy to same location?)
COPY ./here ./here
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
@ConnectedSystems No, it was not redundant. COPY ./Project.toml ./Project.toml transfers Project.toml from the local directory into the container, which is necessary for dependency setup. I have simplified this further by consolidating it into a single line: COPY Project.toml Manifest.toml ./.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Thanks for the clarification. Obviously I'm not so familiar with Docker
Dockerfile
Outdated
COPY Project.toml Manifest.toml ./ | ||
|
||
# Pre-download and cache MKL_jll dependency before general precompilation | ||
RUN julia -e 'using Pkg; Pkg.add("MKL_jll"); Pkg.instantiate(); using MKL_jll' |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Correct me if I'm wrong - MKL_jll
was already a dependency listed in Project/Manifest, right? So the only condition under which this would be faster is if a) the user has already fully built the project b) they go to build again and c) the Project.toml and Manifest.toml has not changed. In that situation, wouldn't the whole pre-compilation already be skipped since the cache would apply to these three layers anyway
COPY Project.toml Manifest.toml ./
# Pre-download and cache MKL_jll dependency before general precompilation
RUN julia -e 'using Pkg; Pkg.add("MKL_jll"); Pkg.instantiate(); using MKL_jll'
# Precompile Julia packages using BuildKit cache for better efficiency
RUN julia --project=@reefguide -e 'using Pkg; Pkg.instantiate(); Pkg.precompile()'
I'm not against pulling out specific 'base' dependencies that we think the image should always have, however, there are some risks
- what if the MKL_jll version changes - this is no longer tracked in the Project/Manifest, right?
- why is this dependency special? Why not all the others? Is it just because it's slow?
If we do want to pull out a specific dependency, I think it makes more sense to do so before we copy the manifest files e.g. this order
RUN julia -e 'using Pkg; Pkg.add("MKL_jll"); Pkg.instantiate(); using MKL_jll'
COPY Project.toml Manifest.toml ./
RUN julia --project=@reefguide -e 'using Pkg; Pkg.instantiate(); Pkg.precompile()'
This way, you would never need to pre-comp this layer again even if manifest/project change.
So in summary - I don't fully understand why this change actually reduces build times since a change in manifest/project will cause the whole thing to generally precompile again, and I have some concerns about adding hard coded dependencies which aren't version managed into the docker file.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
yes, I'm confused by this too. All dependencies should be defined by Project.toml
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
@PeterBaker0
Correct me if I'm wrong - MKL_jll was already a dependency listed in Project/Manifest, right? So ..........
MKL_jll is included through dependencies like LinearAlgebra in Project.toml, meaning it will be precompiled during Pkg.instantiate() and Pkg.precompile(). Docker caching will indeed optimise repeat builds when Project.toml and Manifest.toml remain unchanged by reusing layers and skipping full re-precompilation.
The explicit precompilation of MKL_jll primarily benefits initial build times and mitigates startup warnings. By precompiling MKL_jll separately, we handle its specific overhead upfront, reducing runtime costs and potential resource bottlenecks during the first build. However, in subsequent cached builds, this layer’s impact is minimized as Docker caching will efficiently skip redundant precompilations when dependencies haven’t changed.
If we do want to pull out a specific dependency, I think it makes more sense to do so before we copy the manifest files
We need MKL_jll to be installed with the exact version specified in Manifest.toml, so it’s important to copy both Project.toml and Manifest.toml files first. With this approach, even if Project.toml or Manifest.toml changes and triggers a full re-precompile, Docker will maintain a separate layer for MKL_jll. This separate layer can streamline subsequent builds, as MKL_jll will already be precompiled and cached independently of the other dependencies. This optimises for scenarios where MKL_jll remains unchanged while other dependencies are updated.
what if the MKL_jll version changes - this is no longer tracked in the Project/Manifest, right?
Again, I can see from the Manifest.toml file that LinearAlgebra is configured to use MKL. If this is not essential, we could switch to using OpenBLAS instead. By running this after copying the Manifest.toml file, we ensure that the version specified in Manifest.toml is used. I have removed Pkg.add to prevent it from pulling a different version, ensuring the version from Manifest.toml is used.
why is this dependency special? Why not all the others? Is it just because it's slow?
MKL_jll is singled out because it is a large and resource-intensive dependency with a known tendency to cause significant build-time delays and runtime warnings if not precompiled. By precompiling MKL_jll explicitly during the Docker build phase, the process prevents these warnings and reduces the load at runtime. Other dependencies, generally less demanding, don’t typically require this upfront handling, which is why they aren't singled out in the same way.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
@PeterBaker0 Correct me if I'm wrong - MKL_jll was already a dependency listed in Project/Manifest, right? So ..........
MKL_jll is included through dependencies like LinearAlgebra in Project.toml, meaning it will be precompiled during Pkg.instantiate() and Pkg.precompile(). Docker caching will indeed optimise repeat builds when Project.toml and Manifest.toml remain unchanged by reusing layers and skipping full re-precompilation.
The explicit precompilation of MKL_jll primarily benefits initial build times and mitigates startup warnings. By precompiling MKL_jll separately, we handle its specific overhead upfront, reducing runtime costs and potential resource bottlenecks during the first build. However, in subsequent cached builds, this layer’s impact is minimized as Docker caching will efficiently skip redundant precompilations when dependencies haven’t changed.
What do you mean by "initial build times" - why would initial build times be reduced? You're not actually doing less in the build, right?
"reducing runtime costs" - what runtime costs? Isn't this about decreasing build time? Are you saying the server starts faster?
"efficiently skip redundant precompilations when dependencies haven’t changed." I still don't understand why this is any different than how it was before? If the dependencies didn't change, then it would never have re-precompiled, right?
If we do want to pull out a specific dependency, I think it makes more sense to do so before we copy the manifest files
We need MKL_jll to be installed with the exact version specified in Manifest.toml, so it’s important to copy both Project.toml and Manifest.toml files first. With this approach, even if Project.toml or Manifest.toml changes and triggers a full re-precompile, Docker will maintain a separate layer for MKL_jll. This separate layer can streamline subsequent builds, as MKL_jll will already be precompiled and cached independently of the other dependencies. This optimises for scenarios where MKL_jll remains unchanged while other dependencies are updated.
I just don't think this is true. Docker cache layers will always be invalidated if the file hash of copied inputs changes. E.g.
- user adds package
- manifest file etc change
- this invalidates the copy layer and all layers below it
- MKL_jll layer needs to rebuild because step 3 invalidated this layer
- other packages need to rebuild
what if the MKL_jll version changes - this is no longer tracked in the Project/Manifest, right? Again, I can see from the Manifest.toml file that LinearAlgebra is configured to use MKL. If this is not essential, we could switch to using OpenBLAS instead. By running this after copying the Manifest.toml file, we ensure that the version specified in Manifest.toml is used. I have removed Pkg.add to prevent it from pulling a different version, ensuring the version from Manifest.toml is used.
Sure - but then you will need to recompile MKL_jll anytime the Manifest or Project toml files change, which removes the benefit of pulling it out separately anyway?
why is this dependency special? Why not all the others? Is it just because it's slow?
MKL_jll is singled out because it is a large and resource-intensive dependency with a known tendency to cause significant build-time delays and runtime warnings if not precompiled. By precompiling MKL_jll explicitly during the Docker build phase, the process prevents these warnings and reduces the load at runtime. Other dependencies, generally less demanding, don’t typically require this upfront handling, which is why they aren't singled out in the same way.
Sure - but you would need to pin this version into the dockerfile itself and build it before you bring in project/manifest toml.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
My thoughts:
If MKL_jll does indeed contribute significantly to build times, then there would be benefit to precompiling it separately.
Unlike other approaches, say conda
in Python, where a separate copy of the package is installed for each environment, Julia will share packages that are previously installed across environments so long as the versions align.
This would be beneficial in the case that the version installed by this line:
RUN julia -e 'using Pkg; Pkg.add("MKL_jll"); Pkg.instantiate(); using MKL_jll'
is indeed compatible with the version specified in the Project.toml
and Manifest.toml
files, as installation/compilation is effectively skipped given the previously installed package can be used instead.
But as soon as there is version misalignment (the line above initially installing an older version, and the ReefGuideAPI.jl package specifying a newer version), then everything needs to be re-compiled. But subsequent builds after this will be "fast" again.
In the case the Project.toml file changes, but does not affect the MKL_jll
dependency, then we would still get the improved build times.
I think we need some way of running the line above again if/when the version of MKL_jll
in the Project.toml
file changes, otherwise it'll just silently keep building slowly...
Does this make sense?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
That's not how Docker cache works (from my understanding).
Layer cache hash := hash(file system hash at start of layer, build command and files referenced in the build command)
If the project/manifest change at all, even just a new line or an extra whitespace, this hash is now different (since the build command/referenced files has changed). This will always invalidate all following layers.
So afaik this "In the case the Project.toml file changes, but does not affect the MKL_jll dependency, then we would still get the improved build times." is not true.
But I talked with @pascal082 and I am going to validate this is the case.
I'm not unconvinced that the build times improved, just trying to understand why.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Just confirming that the above explanation matches tested behaviour i.e. any change to dependencies (Manifest/Project files) results in the entire build occurring again including MKL_jll
@@ -1,5 +1,5 @@ | |||
# See https://hub.docker.com/_/julia for valid versions. | |||
ARG JULIA_VERSION="1.10.5" | |||
ARG JULIA_VERSION="1.11.1" |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Do we want to do 1.11
so it's always latest minor, or use exact versions?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
@arlowhite It’s best to always confirm and specify the exact version you need.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Not always @pascal082 - particularly as a default. I think 1.11 is fine as well. If a user needs to specify an exact version they can always override the build arg.
If you run npm i <package>
the default is ^x.y.z
, which allows for patch version increments, for example.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I think v1.11 is fine too
…nd cache it before main package precompilation.
Dockerfile
Outdated
COPY ./Project.toml ./Project.toml | ||
COPY ./Manifest.toml ./Manifest.toml | ||
RUN julia --project=@reefguide -e 'using Pkg; Pkg.instantiate(verbose=true)' | ||
COPY Project.toml Manifest.toml ./ |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
For Julia 1.11, it should be Manifest-v1.11.toml
Dockerfile
Outdated
COPY Project.toml Manifest.toml ./ | ||
|
||
# Pre-download and cache MKL_jll dependency before general precompilation | ||
RUN julia -e 'using Pkg; pkg_path = Pkg.TOML.parsefile("Manifest.toml")["packages"]["MKL_jll"]["git-tree-sha1"]; Pkg.develop(PackageSpec(name="MKL_jll", version=pkg_path)); Pkg.precompile()' |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
ERROR: KeyError: key "packages" not found
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I get this too
@pascal082 Given the above discussion - I think the following
|
@PeterBaker0 Regarding MKL: I believe what's happening is that by pre-installing MKL_jll, dependent packages like FFTW and LinearAlgebra recognize it immediately, skipping redundant configuration steps. This results in faster build times and fewer re-compilations. See the screenshot showing compile times for packages using MKL -pre install/not. This improvement is related to changes in Julia 1.11 affecting package compilation . @arlowhite and @PeterBaker0 Also, since there will be different versions, could we name the Manifest.toml file following a version-specific naming convention? For example, for version 1.11.1, it could be named Manifest-1.11.1.toml, allowing the Dockerfile to select the appropriate version automatically. |
Seeing if this resolves horrendous build times
In your MKL comparison, those look like different log sections. First in "Installed", second has time in ms, I'd assume precompile time. |
In discussion with @PeterBaker0 I've pushed an updated Project/Manifest file which explicitly specifies The thinking is that it will help the dependency resolver optimise the precompile order to avoid method invalidations. Could someone test a build with this? |
… cache for base MKL dep Signed-off-by: Peter Baker <[email protected]>
Signed-off-by: Peter Baker <[email protected]>
Signed-off-by: Peter Baker <[email protected]>
Signed-off-by: Peter Baker <[email protected]>
Signed-off-by: Peter Baker <[email protected]>
Signed-off-by: Peter Baker <[email protected]>
Signed-off-by: Peter Baker <[email protected]>
Signed-off-by: Peter Baker <[email protected]>
Signed-off-by: Peter Baker <[email protected]>
Signed-off-by: Peter Baker <[email protected]>
Signed-off-by: Peter Baker <[email protected]>
Signed-off-by: Peter Baker <[email protected]>
Signed-off-by: Peter Baker <[email protected]>
Signed-off-by: Peter Baker <[email protected]>
Signed-off-by: Peter Baker <[email protected]>
Doesn't help unfortunately. |
@arlowhite @pascal082 I've pushed a revised workflow which pulls out the process of extracting the MKL_jll version into a separate script. I've also put some workflows which check this dependency is up to date, including before running a docker build. The benefit of this approach is that the Dockerfile can now reference this dependency file before copying in the Manifest/Project files, so we don't need to rebuild MKL_jll any time the project dependencies change. I tested these scenarios 7m48.721
Vs 13m48.615s
And indeed @pascal082 's observation is correct that it does reduce build time very significantly. With the latest approach, things seem even better (though probably within margin of error for system performance), last build was 5m38.439s. @pascal082 you should take a look and make sure changes make sense. |
Signed-off-by: Peter Baker <[email protected]>
Signed-off-by: Peter Baker <[email protected]>
@PeterBaker0 The remaining issue is ensuring we use the appropriate Manifest.toml file (in get_mkl.sh) that matches the Julia version we are working with get_mkl.sh Check if Manifest.toml existsif [ ! -f "Manifest.toml" ]; then |
.github/workflows/CheckMklDep.yml
Outdated
- name: Extract current MKL hash from Manifest | ||
id: extract | ||
run: | | ||
CURRENT_HASH=$(awk '/\[\[deps\.MKL_jll\]\]/{flag=1;next}/\[\[/{if(flag){exit}}flag&&/uuid/{gsub(/^.*= "|".*$/,"",$0);printf "%s", $0}' Manifest.toml) |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I'm okay with this to be pragmatic, but noting in future we should aim to remove awkward use of awk.
Could we at least add a comment explaining what this magic line does?
I don't like all of this complexity escaping the Dockerfile. This works:
Also, the "hash" is a project/package UUID, it never changes. EDIT: problem here is it requires moving the manifest copy above, which will invalidate the cache more often. |
Very zen Arlo, I prefer this approach. |
…ndnecy Signed-off-by: Peter Baker <[email protected]>
Signed-off-by: Peter Baker <[email protected]>
Signed-off-by: Peter Baker <[email protected]>
@arlowhite @ConnectedSystems @pascal082 have simplified the approach using combination of methods mentioned in conversations. Removed all github actions to sync mkl.dep as it was not worth complexity for meagre optimisations which are not relevant to github actions anyway. See https://github.com/open-AIMS/ReefGuideAPI.jl/actions/runs/11585249026 for test build running now. |
@pascal082 this is resolved now - have added Manifest* instead of Manifest to docker copy, and the above script no longer exists |
@@ -28,6 +28,7 @@ JSON = "682c06a0-de6a-54ab-a142-c8b1cf79cde6" | |||
JSONWebTokens = "9b8beb19-0777-58c6-920b-28f749fee4d3" | |||
LibGEOS = "a90b1aa1-3769-5649-ba7e-abc5a9d163eb" | |||
LinearAlgebra = "37e2e46d-f89d-539d-b4ee-838fcccc9c8e" | |||
MKL_jll = "856f044c-d86e-5d09-b602-aeab76dc8ba7" |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
maybe more correct to put in weakdeps, but not sure if it affects how Julia operates
Fix BASE_IMAGE not working
Optimised Docker build time from 18m to 8.3m by precompiling MKL_jll and IntelOpenMP artifacts, resolving warnings and reducing startup load, and added gdal-bin, libgdal-dev, and libfftw3-dev for required geospatial and scientific dependencies