Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Add crate index caching guidance #274

Closed
wants to merge 2 commits into from

Conversation

alnoki
Copy link

@alnoki alnoki commented Jul 17, 2024

Crate index caching

This PR adds documentation on how to cache a local crate index when working with
workspaces that have large git dependencies. I devised this method after I
noticed that the cargo chef cook step was cloning a non-target git
dependency (namely, aptos-core)
during a cargo-chef build, since compilation requires a complete local crate
index.

The documentation in the PR goes into detail about the mechanisms at play, and
below I'm including an example for additional illustrative purposes.

Related:

Example

Layout

Consider the following workspace:

├── Cargo.toml
├── Dockerfile
├── my_package
│   ├── Cargo.toml
│   └── my_bin.rs
└── another_package
    ├── Cargo.toml
    └── another_bin.rs

The top-level Cargo.toml file defines two packages:

[workspace]
members = [
  "my_package",
  "another_package"
]
resolver = "2"

[workspace.package]
edition = "2021"
rust-version = "1.79.0"

The Dockerfile is identical to the template proposed in this PR:

FROM lukemathwalker/cargo-chef:latest-rust-1 AS chef
WORKDIR /app

FROM chef AS planner
ARG BIN
COPY . .
# Prepare recipe one directory up to simplify local crate index caching.
RUN cargo chef prepare --bin "$BIN" --recipe-path ../recipe.json
# Delete everything not required to build complete local crate index, to avoid
# invalidating local crate index cache on code changes or recipe updates.
RUN find -type f \! \( -name 'Cargo.toml' -o -name 'Cargo.lock' \) -delete && \
    find -type d -empty -delete

# Invoke a dry run lockfile update against the manifest skeleton, thereby
# caching a complete local crate index.
FROM chef AS indexer
COPY --from=planner /app .
RUN cargo update --dry-run

FROM chef AS builder
ARG BIN PACKAGE
COPY --from=planner /recipe.json recipe.json
# Copy cached crate index.
COPY --from=indexer $CARGO_HOME $CARGO_HOME
# Build in locked mode to prevent local crate index cache invalidation, thereby
# downloading only the necessary dependencies for the binary.
RUN cargo chef cook --bin "$BIN" --locked --package "$PACKAGE" --release
COPY . .
# Build offline solely from cached crate index and downloaded dependencies.
RUN cargo build --bin "$BIN" --frozen --package "$PACKAGE" --release
# Rename executable for ease of copying.
RUN mv "/app/target/release/$BIN" /app/executable;

FROM debian:bookworm-slim AS runtime
COPY --from=builder /app/executable /usr/local/bin
ENTRYPOINT ["/usr/local/bin/executable"]

The Cargo.toml for my_package has no special dependencies:

[[bin]]
name = "my-bin"
path = "my_bin.rs"

[package]
edition = "2021"
name = "my_package"
version = "1.0.0"

And my_bin.rs declares a simple "Hello, world!" statement:

fn main() {
    println!("Hello, world!")
}

However, the Cargo.toml for another_package has a git dependency on
aptos-core (note that per
aptos-core #8984
there is no plan to support package management on crates.io):

[[bin]]
name = "another-bin"
path = "another_bin.rs"

[dependencies.move-core-types]
git = "https://github.com/aptos-labs/aptos-core"
tag = "aptos-node-v1.15.2"

[package]
edition = "2021"
name = "another_package"
version = "1.0.0"

Note that another_bin.rs has a modified "Hello, world!" statement, which
relies on a random account address generated via the move-core-types
dependency:

use move_core_types::account_address::AccountAddress;

fn main() {
    println!("Hello, {}!", AccountAddress::random());
}

Cache hit dynamics

To follow along, replicate the above workspace. Then generate a lockfile:

cargo check

To build and run my-bin via cargo-chef:

docker build \
    --build-arg="BIN=my-bin" \
    --build-arg="PACKAGE=my_package" \
    --tag my-bin \
    .
docker run my-bin
Hello, world!

Note that this downloads the entire
aptos-core repository during the
--dry-run step, since a local crate index is required for the eventual
cargo chef cook operation:

 => [indexer 2/2] RUN cargo update --dry-run

However, if my_bin.rs is modified to instead print Hello, chef!, since the
aptos-core git dependency
crate index is already cached, the repository does not need to be downloaded
again when re-building the image.

To run another-bin:

docker build \
    --build-arg="BIN=another-bin" \
    --build-arg="PACKAGE=another_package" \
    --tag another-bin \
    .
docker run another-bin
Hello, 0xa53c237d4f6fd71c6355254a36ecaa8fed0269430669131d21a27c732d66b18e!

Here, the local image cache preserves the output for the --dry-run crate index
generation step, since the Cargo.toml manifest skeleton is common across both
builds in the workspace.

Moreover, updating another_bin.rs to print Goodbye, ... results in another
cache hit since there are no new dependencies.

Cache miss dynamics

The local crate index cache step can be undone by simply commenting out the
following line in the Dockerfile:

COPY --from=indexer $CARGO_HOME $CARGO_HOME

In this case, the cargo chef cook command has no access to a local crate index
cache, and it will need to regenerate it whenever a recipe changes. Notably,
this involves re-downloading
aptos-core
even for changes to my_package that have nothing to do with the dependency.

@alnoki
Copy link
Author

alnoki commented Jul 22, 2024

I am closing this because I realized that the operations stipulated therein are effectively already taken care of by cargo chef cook.

@alnoki alnoki closed this Jul 22, 2024
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

1 participant