Add crate index caching guidance #274

alnoki · 2024-07-17T02:12:53Z

Crate index caching

This PR adds documentation on how to cache a local crate index when working with
workspaces that have large git dependencies. I devised this method after I
noticed that the cargo chef cook step was cloning a non-target git
dependency (namely, aptos-core)
during a cargo-chef build, since compilation requires a complete local crate
index.

The documentation in the PR goes into detail about the mechanisms at play, and
below I'm including an example for additional illustrative purposes.

Layout

Consider the following workspace:

├── Cargo.toml
├── Dockerfile
├── my_package
│   ├── Cargo.toml
│   └── my_bin.rs
└── another_package
    ├── Cargo.toml
    └── another_bin.rs

The top-level Cargo.toml file defines two packages:

[workspace]
members = [
  "my_package",
  "another_package"
]
resolver = "2"

[workspace.package]
edition = "2021"
rust-version = "1.79.0"

The Dockerfile is identical to the template proposed in this PR:

FROM lukemathwalker/cargo-chef:latest-rust-1 AS chef
WORKDIR /app

FROM chef AS planner
ARG BIN
COPY . .
# Prepare recipe one directory up to simplify local crate index caching.
RUN cargo chef prepare --bin "$BIN" --recipe-path ../recipe.json
# Delete everything not required to build complete local crate index, to avoid
# invalidating local crate index cache on code changes or recipe updates.
RUN find -type f \! \( -name 'Cargo.toml' -o -name 'Cargo.lock' \) -delete && \
    find -type d -empty -delete

# Invoke a dry run lockfile update against the manifest skeleton, thereby
# caching a complete local crate index.
FROM chef AS indexer
COPY --from=planner /app .
RUN cargo update --dry-run

FROM chef AS builder
ARG BIN PACKAGE
COPY --from=planner /recipe.json recipe.json
# Copy cached crate index.
COPY --from=indexer $CARGO_HOME $CARGO_HOME
# Build in locked mode to prevent local crate index cache invalidation, thereby
# downloading only the necessary dependencies for the binary.
RUN cargo chef cook --bin "$BIN" --locked --package "$PACKAGE" --release
COPY . .
# Build offline solely from cached crate index and downloaded dependencies.
RUN cargo build --bin "$BIN" --frozen --package "$PACKAGE" --release
# Rename executable for ease of copying.
RUN mv "/app/target/release/$BIN" /app/executable;

FROM debian:bookworm-slim AS runtime
COPY --from=builder /app/executable /usr/local/bin
ENTRYPOINT ["/usr/local/bin/executable"]

The Cargo.toml for my_package has no special dependencies:

[[bin]]
name = "my-bin"
path = "my_bin.rs"

[package]
edition = "2021"
name = "my_package"
version = "1.0.0"

And my_bin.rs declares a simple "Hello, world!" statement:

fn main() {
    println!("Hello, world!")
}

However, the Cargo.toml for another_package has a git dependency on
aptos-core (note that per
aptos-core #8984
there is no plan to support package management on crates.io):

[[bin]]
name = "another-bin"
path = "another_bin.rs"

[dependencies.move-core-types]
git = "https://github.com/aptos-labs/aptos-core"
tag = "aptos-node-v1.15.2"

[package]
edition = "2021"
name = "another_package"
version = "1.0.0"

Note that another_bin.rs has a modified "Hello, world!" statement, which
relies on a random account address generated via the move-core-types
dependency:

use move_core_types::account_address::AccountAddress;

fn main() {
    println!("Hello, {}!", AccountAddress::random());
}

Cache hit dynamics

To follow along, replicate the above workspace. Then generate a lockfile:

cargo check

To build and run my-bin via cargo-chef:

docker build \
    --build-arg="BIN=my-bin" \
    --build-arg="PACKAGE=my_package" \
    --tag my-bin \
    .
docker run my-bin

Hello, world!

Note that this downloads the entire
aptos-core repository during the
--dry-run step, since a local crate index is required for the eventual
cargo chef cook operation:

 => [indexer 2/2] RUN cargo update --dry-run

However, if my_bin.rs is modified to instead print Hello, chef!, since the
aptos-core git dependency
crate index is already cached, the repository does not need to be downloaded
again when re-building the image.

To run another-bin:

docker build \
    --build-arg="BIN=another-bin" \
    --build-arg="PACKAGE=another_package" \
    --tag another-bin \
    .
docker run another-bin

Hello, 0xa53c237d4f6fd71c6355254a36ecaa8fed0269430669131d21a27c732d66b18e!

Here, the local image cache preserves the output for the --dry-run crate index
generation step, since the Cargo.toml manifest skeleton is common across both
builds in the workspace.

Moreover, updating another_bin.rs to print Goodbye, ... results in another
cache hit since there are no new dependencies.

Cache miss dynamics

The local crate index cache step can be undone by simply commenting out the
following line in the Dockerfile:

COPY --from=indexer $CARGO_HOME $CARGO_HOME

In this case, the cargo chef cook command has no access to a local crate index
cache, and it will need to regenerate it whenever a recipe changes. Notably,
this involves re-downloading
aptos-core
even for changes to my_package that have nothing to do with the dependency.

alnoki · 2024-07-22T19:31:18Z

I am closing this because I realized that the operations stipulated therein are effectively already taken care of by cargo chef cook.

alnoki added 2 commits July 15, 2024 20:58

Add crate index caching guidance

b86d0da

Link to PR

bd31ad7

alnoki closed this Jul 22, 2024

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Add crate index caching guidance #274

Add crate index caching guidance #274

alnoki commented Jul 17, 2024

alnoki commented Jul 22, 2024

Add crate index caching guidance #274

Add crate index caching guidance #274

Conversation

alnoki commented Jul 17, 2024

Crate index caching

Example

Layout

Cache hit dynamics

Cache miss dynamics

alnoki commented Jul 22, 2024