Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Fix BASE_IMAGE Issue and Optimise Docker Build Time to 8.3m with Precompilation #47

Merged
merged 29 commits into from
Nov 11, 2024

Conversation

pascal082
Copy link
Contributor

@pascal082 pascal082 commented Oct 25, 2024

  1. Fix BASE_IMAGE not working

  2. Optimised Docker build time from 18m to 8.3m by precompiling MKL_jll and IntelOpenMP artifacts, resolving warnings and reducing startup load, and added gdal-bin, libgdal-dev, and libfftw3-dev for required geospatial and scientific dependencies

Pascal Omondiagbe added 2 commits October 21, 2024 12:13
- Added explicit installation and precompilation of MKL_jll in the Dockerfile to resolve warnings about missing precompiled modules during container startup.
- Precompiled MKL_jll and IntelOpenMP artifacts during the build phase, significantly reducing precompilation time from 18 minutes to 9 minutes by avoiding on-the-fly precompilation.
- The time reduction is attributed to precompiling MKL_jll in advance, preventing the heavy processing load that would otherwise occur during runtime.

- Added gdal-bin, libgdal-dev, and libfftw3-dev to the Dockerfile to support system dependencies required for geospatial and scientific computations.
@pascal082 pascal082 changed the title Pascal Fix BASE_IMAGE Issue and Optimise Docker Build Time to 8.3m with Precompilation Oct 25, 2024
Dockerfile Outdated
Comment on lines 109 to 112
COPY ./Project.toml ./Project.toml
COPY ./Manifest.toml ./Manifest.toml
RUN julia --project=@reefguide -e 'using Pkg; Pkg.instantiate(verbose=true)'
# Pre-download and cache MKL_jll dependency before general precompilation
RUN julia -e 'using Pkg; Pkg.add("MKL_jll"); Pkg.instantiate(); using MKL_jll'
Copy link
Collaborator

@ConnectedSystems ConnectedSystems Oct 25, 2024

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Not sure which project is being instantiated here.

Am I correct in assuming the build is faster to complete because the step to precompile MKL and others is essentially skipped in subsequent builds?

Additionally, are the two copy statements above necessary? They seem to simply copy the current files to their current locations?

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

@ConnectedSystems Yes, subsequent builds are faster because the MKL_jll and other dependencies are precompiled and cached, which prevents the need for recompilation at runtime.

Regarding the COPY statements: yes, they are needed to ensure Project.toml and Manifest.toml are available for Julia to install and precompile dependencies during Pkg.instantiate()

Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Regarding the COPY statements: yes, they are needed to ensure Project.toml and Manifest.toml are available for Julia to install and precompile dependencies during Pkg.instantiate()

Yes, I'm asking for clarification as it looks like it is redundant (copy to same location?)

COPY ./here ./here

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

@ConnectedSystems No, it was not redundant. COPY ./Project.toml ./Project.toml transfers Project.toml from the local directory into the container, which is necessary for dependency setup. I have simplified this further by consolidating it into a single line: COPY Project.toml Manifest.toml ./.

Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Thanks for the clarification. Obviously I'm not so familiar with Docker

Dockerfile Outdated
COPY Project.toml Manifest.toml ./

# Pre-download and cache MKL_jll dependency before general precompilation
RUN julia -e 'using Pkg; Pkg.add("MKL_jll"); Pkg.instantiate(); using MKL_jll'
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Correct me if I'm wrong - MKL_jll was already a dependency listed in Project/Manifest, right? So the only condition under which this would be faster is if a) the user has already fully built the project b) they go to build again and c) the Project.toml and Manifest.toml has not changed. In that situation, wouldn't the whole pre-compilation already be skipped since the cache would apply to these three layers anyway

COPY Project.toml Manifest.toml ./

# Pre-download and cache MKL_jll dependency before general precompilation
RUN julia -e 'using Pkg; Pkg.add("MKL_jll"); Pkg.instantiate(); using MKL_jll'

# Precompile Julia packages using BuildKit cache for better efficiency
RUN julia --project=@reefguide -e 'using Pkg; Pkg.instantiate(); Pkg.precompile()'

I'm not against pulling out specific 'base' dependencies that we think the image should always have, however, there are some risks

  • what if the MKL_jll version changes - this is no longer tracked in the Project/Manifest, right?
  • why is this dependency special? Why not all the others? Is it just because it's slow?

If we do want to pull out a specific dependency, I think it makes more sense to do so before we copy the manifest files e.g. this order

RUN julia -e 'using Pkg; Pkg.add("MKL_jll"); Pkg.instantiate(); using MKL_jll'
COPY Project.toml Manifest.toml ./
RUN julia --project=@reefguide -e 'using Pkg; Pkg.instantiate(); Pkg.precompile()'

This way, you would never need to pre-comp this layer again even if manifest/project change.

So in summary - I don't fully understand why this change actually reduces build times since a change in manifest/project will cause the whole thing to generally precompile again, and I have some concerns about adding hard coded dependencies which aren't version managed into the docker file.

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

yes, I'm confused by this too. All dependencies should be defined by Project.toml

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

@PeterBaker0
Correct me if I'm wrong - MKL_jll was already a dependency listed in Project/Manifest, right? So ..........

MKL_jll is included through dependencies like LinearAlgebra in Project.toml, meaning it will be precompiled during Pkg.instantiate() and Pkg.precompile(). Docker caching will indeed optimise repeat builds when Project.toml and Manifest.toml remain unchanged by reusing layers and skipping full re-precompilation.

The explicit precompilation of MKL_jll primarily benefits initial build times and mitigates startup warnings. By precompiling MKL_jll separately, we handle its specific overhead upfront, reducing runtime costs and potential resource bottlenecks during the first build. However, in subsequent cached builds, this layer’s impact is minimized as Docker caching will efficiently skip redundant precompilations when dependencies haven’t changed.

If we do want to pull out a specific dependency, I think it makes more sense to do so before we copy the manifest files

We need MKL_jll to be installed with the exact version specified in Manifest.toml, so it’s important to copy both Project.toml and Manifest.toml files first. With this approach, even if Project.toml or Manifest.toml changes and triggers a full re-precompile, Docker will maintain a separate layer for MKL_jll. This separate layer can streamline subsequent builds, as MKL_jll will already be precompiled and cached independently of the other dependencies. This optimises for scenarios where MKL_jll remains unchanged while other dependencies are updated.

what if the MKL_jll version changes - this is no longer tracked in the Project/Manifest, right?
Again, I can see from the Manifest.toml file that LinearAlgebra is configured to use MKL. If this is not essential, we could switch to using OpenBLAS instead. By running this after copying the Manifest.toml file, we ensure that the version specified in Manifest.toml is used. I have removed Pkg.add to prevent it from pulling a different version, ensuring the version from Manifest.toml is used.

why is this dependency special? Why not all the others? Is it just because it's slow?

MKL_jll is singled out because it is a large and resource-intensive dependency with a known tendency to cause significant build-time delays and runtime warnings if not precompiled. By precompiling MKL_jll explicitly during the Docker build phase, the process prevents these warnings and reduces the load at runtime. Other dependencies, generally less demanding, don’t typically require this upfront handling, which is why they aren't singled out in the same way.

Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

@PeterBaker0 Correct me if I'm wrong - MKL_jll was already a dependency listed in Project/Manifest, right? So ..........

MKL_jll is included through dependencies like LinearAlgebra in Project.toml, meaning it will be precompiled during Pkg.instantiate() and Pkg.precompile(). Docker caching will indeed optimise repeat builds when Project.toml and Manifest.toml remain unchanged by reusing layers and skipping full re-precompilation.

The explicit precompilation of MKL_jll primarily benefits initial build times and mitigates startup warnings. By precompiling MKL_jll separately, we handle its specific overhead upfront, reducing runtime costs and potential resource bottlenecks during the first build. However, in subsequent cached builds, this layer’s impact is minimized as Docker caching will efficiently skip redundant precompilations when dependencies haven’t changed.

What do you mean by "initial build times" - why would initial build times be reduced? You're not actually doing less in the build, right?

"reducing runtime costs" - what runtime costs? Isn't this about decreasing build time? Are you saying the server starts faster?

"efficiently skip redundant precompilations when dependencies haven’t changed." I still don't understand why this is any different than how it was before? If the dependencies didn't change, then it would never have re-precompiled, right?

If we do want to pull out a specific dependency, I think it makes more sense to do so before we copy the manifest files

We need MKL_jll to be installed with the exact version specified in Manifest.toml, so it’s important to copy both Project.toml and Manifest.toml files first. With this approach, even if Project.toml or Manifest.toml changes and triggers a full re-precompile, Docker will maintain a separate layer for MKL_jll. This separate layer can streamline subsequent builds, as MKL_jll will already be precompiled and cached independently of the other dependencies. This optimises for scenarios where MKL_jll remains unchanged while other dependencies are updated.

I just don't think this is true. Docker cache layers will always be invalidated if the file hash of copied inputs changes. E.g.

  1. user adds package
  2. manifest file etc change
  3. this invalidates the copy layer and all layers below it
  4. MKL_jll layer needs to rebuild because step 3 invalidated this layer
  5. other packages need to rebuild

what if the MKL_jll version changes - this is no longer tracked in the Project/Manifest, right? Again, I can see from the Manifest.toml file that LinearAlgebra is configured to use MKL. If this is not essential, we could switch to using OpenBLAS instead. By running this after copying the Manifest.toml file, we ensure that the version specified in Manifest.toml is used. I have removed Pkg.add to prevent it from pulling a different version, ensuring the version from Manifest.toml is used.

Sure - but then you will need to recompile MKL_jll anytime the Manifest or Project toml files change, which removes the benefit of pulling it out separately anyway?

why is this dependency special? Why not all the others? Is it just because it's slow?

MKL_jll is singled out because it is a large and resource-intensive dependency with a known tendency to cause significant build-time delays and runtime warnings if not precompiled. By precompiling MKL_jll explicitly during the Docker build phase, the process prevents these warnings and reduces the load at runtime. Other dependencies, generally less demanding, don’t typically require this upfront handling, which is why they aren't singled out in the same way.

Sure - but you would need to pin this version into the dockerfile itself and build it before you bring in project/manifest toml.

Copy link
Collaborator

@ConnectedSystems ConnectedSystems Oct 28, 2024

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

My thoughts:

If MKL_jll does indeed contribute significantly to build times, then there would be benefit to precompiling it separately.

Unlike other approaches, say conda in Python, where a separate copy of the package is installed for each environment, Julia will share packages that are previously installed across environments so long as the versions align.

This would be beneficial in the case that the version installed by this line:

RUN julia -e 'using Pkg; Pkg.add("MKL_jll"); Pkg.instantiate(); using MKL_jll'

is indeed compatible with the version specified in the Project.toml and Manifest.toml files, as installation/compilation is effectively skipped given the previously installed package can be used instead.

But as soon as there is version misalignment (the line above initially installing an older version, and the ReefGuideAPI.jl package specifying a newer version), then everything needs to be re-compiled. But subsequent builds after this will be "fast" again.

In the case the Project.toml file changes, but does not affect the MKL_jll dependency, then we would still get the improved build times.

I think we need some way of running the line above again if/when the version of MKL_jll in the Project.toml file changes, otherwise it'll just silently keep building slowly...

Does this make sense?

Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

That's not how Docker cache works (from my understanding).

Layer cache hash := hash(file system hash at start of layer, build command and files referenced in the build command)

If the project/manifest change at all, even just a new line or an extra whitespace, this hash is now different (since the build command/referenced files has changed). This will always invalidate all following layers.

So afaik this "In the case the Project.toml file changes, but does not affect the MKL_jll dependency, then we would still get the improved build times." is not true.

But I talked with @pascal082 and I am going to validate this is the case.

I'm not unconvinced that the build times improved, just trying to understand why.

Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Just confirming that the above explanation matches tested behaviour i.e. any change to dependencies (Manifest/Project files) results in the entire build occurring again including MKL_jll

@@ -1,5 +1,5 @@
# See https://hub.docker.com/_/julia for valid versions.
ARG JULIA_VERSION="1.10.5"
ARG JULIA_VERSION="1.11.1"
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Do we want to do 1.11 so it's always latest minor, or use exact versions?

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

@arlowhite It’s best to always confirm and specify the exact version you need.

Copy link
Collaborator

@PeterBaker0 PeterBaker0 Oct 28, 2024

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Not always @pascal082 - particularly as a default. I think 1.11 is fine as well. If a user needs to specify an exact version they can always override the build arg.

If you run npm i <package> the default is ^x.y.z, which allows for patch version increments, for example.

Copy link
Collaborator

@ConnectedSystems ConnectedSystems Oct 28, 2024

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I think v1.11 is fine too

…nd cache it before main package precompilation.
Dockerfile Outdated
COPY ./Project.toml ./Project.toml
COPY ./Manifest.toml ./Manifest.toml
RUN julia --project=@reefguide -e 'using Pkg; Pkg.instantiate(verbose=true)'
COPY Project.toml Manifest.toml ./
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

For Julia 1.11, it should be Manifest-v1.11.toml

Dockerfile Outdated
COPY Project.toml Manifest.toml ./

# Pre-download and cache MKL_jll dependency before general precompilation
RUN julia -e 'using Pkg; pkg_path = Pkg.TOML.parsefile("Manifest.toml")["packages"]["MKL_jll"]["git-tree-sha1"]; Pkg.develop(PackageSpec(name="MKL_jll", version=pkg_path)); Pkg.precompile()'
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

ERROR: KeyError: key "packages" not found

Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I get this too

@PeterBaker0
Copy link
Collaborator

@pascal082 Given the above discussion - I think the following

  • changes to Julia version/build arg make sense with above comments
  • pulling out MKL_jll - it's not clear to me under what conditions this results in the reduced build times, are you referring to in the GitHub build pipeline, or your local machine? And what is the before/after - are these builds from scratch, or builds with a cache/partial cache? This is a bit of a variation from the norm in terms of building with dependencies so warrants an understanding of why this is saving time.

@pascal082
Copy link
Contributor Author

pascal082 commented Oct 28, 2024

@pascal082 Given the above discussion - I think the following

  • changes to Julia version/build arg make sense with above comments
  • pulling out MKL_jll - it's not clear to me under what conditions this results in the reduced build times, are you referring to in the GitHub build pipeline, or your local machine? And what is the before/after - are these builds from scratch, or builds with a cache/partial cache? This is a bit of a variation from the norm in terms of building with dependencies so warrants an understanding of why this is saving time.

@PeterBaker0
Just returning to this. I think the team initially set up is the ideal scenario: having the benefits of a pinned default version while allowing flexibility to test or move to newer versions as needed with the build command. I can adjust it to always pull the latest version in the Dockerfile, which can then be overridden with the build command, if that aligns with the team’s standard - @arlowhite

Regarding MKL: I believe what's happening is that by pre-installing MKL_jll, dependent packages like FFTW and LinearAlgebra recognize it immediately, skipping redundant configuration steps. This results in faster build times and fewer re-compilations. See the screenshot showing compile times for packages using MKL -pre install/not. This improvement is related to changes in Julia 1.11 affecting package compilation .

@arlowhite and @PeterBaker0 Also, since there will be different versions, could we name the Manifest.toml file following a version-specific naming convention? For example, for version 1.11.1, it could be named Manifest-1.11.1.toml, allowing the Dockerfile to select the appropriate version automatically.

With MKL pre install
image

Without MKL pre install
image

Seeing if this resolves horrendous build times
@arlowhite
Copy link
Contributor

In your MKL comparison, those look like different log sections. First in "Installed", second has time in ms, I'd assume precompile time.

@ConnectedSystems
Copy link
Collaborator

In discussion with @PeterBaker0 I've pushed an updated Project/Manifest file which explicitly specifies MKL_jll.

The thinking is that it will help the dependency resolver optimise the precompile order to avoid method invalidations.

Could someone test a build with this?

PeterBaker0 and others added 13 commits October 29, 2024 11:38
Signed-off-by: Peter Baker <[email protected]>
Signed-off-by: Peter Baker <[email protected]>
Signed-off-by: Peter Baker <[email protected]>
Signed-off-by: Peter Baker <[email protected]>
Signed-off-by: Peter Baker <[email protected]>
Signed-off-by: Peter Baker <[email protected]>
Signed-off-by: Peter Baker <[email protected]>
Signed-off-by: Peter Baker <[email protected]>
Signed-off-by: Peter Baker <[email protected]>
@PeterBaker0
Copy link
Collaborator

In discussion with @PeterBaker0 I've pushed an updated Project/Manifest file which explicitly specifies MKL_jll.

The thinking is that it will help the dependency resolver optimise the precompile order to avoid method invalidations.

Could someone test a build with this?

Doesn't help unfortunately.

@PeterBaker0
Copy link
Collaborator

@arlowhite @pascal082 I've pushed a revised workflow which pulls out the process of extracting the MKL_jll version into a separate script. I've also put some workflows which check this dependency is up to date, including before running a docker build.

The benefit of this approach is that the Dockerfile can now reference this dependency file before copying in the Manifest/Project files, so we don't need to rebuild MKL_jll any time the project dependencies change.

I tested these scenarios

7m48.721

install MKL
install everything

Vs

13m48.615s

install everything

And indeed @pascal082 's observation is correct that it does reduce build time very significantly.

With the latest approach, things seem even better (though probably within margin of error for system performance), last build was 5m38.439s.

@pascal082 you should take a look and make sure changes make sense.

@pascal082
Copy link
Contributor Author

pascal082 commented Oct 29, 2024

@PeterBaker0 The remaining issue is ensuring we use the appropriate Manifest.toml file (in get_mkl.sh) that matches the Julia version we are working with

get_mkl.sh

Check if Manifest.toml exists

if [ ! -f "Manifest.toml" ]; then
echo "Error: Manifest.toml not found in current directory"
exit 1
fi

- name: Extract current MKL hash from Manifest
id: extract
run: |
CURRENT_HASH=$(awk '/\[\[deps\.MKL_jll\]\]/{flag=1;next}/\[\[/{if(flag){exit}}flag&&/uuid/{gsub(/^.*= "|".*$/,"",$0);printf "%s", $0}' Manifest.toml)
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I'm okay with this to be pragmatic, but noting in future we should aim to remove awkward use of awk.
Could we at least add a comment explaining what this magic line does?

@arlowhite
Copy link
Contributor

arlowhite commented Oct 29, 2024

I don't like all of this complexity escaping the Dockerfile. This works:

COPY Project.toml Manifest*.toml ./

# Compile MKL_jll first - this improves build time significantly - unsure exactly why
# Extract version from Pkg.status output
# Status `~/foo/Project.toml`
#  [856f044c] MKL_jll v2024.2.0+0
# +0 causes 'ERROR: ArgumentError: invalid base 10 digit '+' in "0+0"'
# sed regex is weird, couldn't get it to trim the '+', so using cut
RUN export MKL_JLL_VERSION=`julia --project=. -e 'using Pkg; Pkg.status("MKL_jll")' | sed -rn 's/.*MKL_jll v()/\1/p' | cut -d '+' -f 1` && \
  julia -e 'using Pkg; Pkg.add(name="MKL_jll", version=ENV["MKL_JLL_VERSION"]); Pkg.precompile()'

# Precompile Julia packages using BuildKit cache for better efficiency
RUN julia --project=@reefguide -e 'using Pkg; Pkg.instantiate(); Pkg.precompile()'

Also, the "hash" is a project/package UUID, it never changes. version is what specifies the MKL_jll version. We could use uuid if we're paranoid about "MKL_jll" name collision, but I don't think it's necessary.

EDIT: problem here is it requires moving the manifest copy above, which will invalidate the cache more often.

@ConnectedSystems
Copy link
Collaborator

ConnectedSystems commented Oct 29, 2024

Very zen Arlo, I prefer this approach.

@PeterBaker0 PeterBaker0 requested a review from arlowhite October 30, 2024 01:24
@PeterBaker0
Copy link
Collaborator

@arlowhite @ConnectedSystems @pascal082 have simplified the approach using combination of methods mentioned in conversations. Removed all github actions to sync mkl.dep as it was not worth complexity for meagre optimisations which are not relevant to github actions anyway.

See https://github.com/open-AIMS/ReefGuideAPI.jl/actions/runs/11585249026 for test build running now.

@PeterBaker0
Copy link
Collaborator

@PeterBaker0 The remaining issue is ensuring we use the appropriate Manifest.toml file (in get_mkl.sh) that matches the Julia version we are working with

get_mkl.sh

Check if Manifest.toml exists

if [ ! -f "Manifest.toml" ]; then echo "Error: Manifest.toml not found in current directory" exit 1 fi

@pascal082 this is resolved now - have added Manifest* instead of Manifest to docker copy, and the above script no longer exists

@@ -28,6 +28,7 @@ JSON = "682c06a0-de6a-54ab-a142-c8b1cf79cde6"
JSONWebTokens = "9b8beb19-0777-58c6-920b-28f749fee4d3"
LibGEOS = "a90b1aa1-3769-5649-ba7e-abc5a9d163eb"
LinearAlgebra = "37e2e46d-f89d-539d-b4ee-838fcccc9c8e"
MKL_jll = "856f044c-d86e-5d09-b602-aeab76dc8ba7"
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

maybe more correct to put in weakdeps, but not sure if it affects how Julia operates

@arlowhite arlowhite merged commit adf7ddc into main Nov 11, 2024
1 check passed
@ConnectedSystems ConnectedSystems deleted the pascal branch November 11, 2024 23:07
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

4 participants