Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

GHA pipeline rewrite for ease and speed #1551

Merged
merged 71 commits into from
Nov 17, 2023
Merged

GHA pipeline rewrite for ease and speed #1551

merged 71 commits into from
Nov 17, 2023

Conversation

gpmayorga
Copy link
Contributor

@gpmayorga gpmayorga commented Sep 15, 2023

Description

Main goal was to activate an efficient cache system to improve build time of the chain nodes and wasm builds. But since every single workflow file needed touching I started consolidating code and adding other improvements resulting in an almost full re-write of our pipelines.

Ref (internal) document: https://centrifuge.hackmd.io/Ftgot4ilSxK5ERyrEujnBg?view

Changes and Descriptions

Cache features

This PR introduces 3 types of cache systems:

  1. Cross-PR and branch cache setup with a GCS bucket through "smart" Mozilla's sccache. Any workflow using Rust can take advantage of this by using actions/prep-ubuntu with some parameters
    Used by: sanity checks, benchmarks, linters
  2. srtool WASM build cache
    Using the same cache system as our current pipelines with some enhancements (Swatinem/rust-cache). Getsmounted on the srtool container.
    Used by: wasm runtime builds
  3. Docker layer caching with docker buildx
    Using buildx cache system, setup to use the registry as a cache but can be setup with GHA native cache system
    Used by: docker build

To review:

Benhcmarks

Now benchmarks will run on the main branch for every push and create a PR with the new weights (creating a PR from a bot has been disabled org-wide by an admin after this, woops!).
The benchmark job will upload the benchmark results to the job's page.
To review:

  • Benchmark cache - The job takes over 1h each time, how to lower it?
  • benchmark-test - Longest job on the "PR tests". Do we need it? Can we make it shorter/faster?

Docker builds

  • Modified and improved Dockerfile for speed and stability, inspired in the polkadot-sdk Dockerfile.
  • Standardize the Docker tags using automated docker-metadata.
    To review:
  • Decide on the Docker metadata strategy
  • If we have to tell a third party to take the "latest released tag". What would it be?
  • Decide on the docker build cache: GHA backed.

New features/enhancements

Checklist to set this PR as "ready"

  • Try out sccache GHA module: does not work with srtool container (build wasms)
  • Add a different cache system for WASM builds. -> Swatinem/rust-cache GH Action
  • Add cache for Docker builds
  • Review and refactor GHA workflows
    - [ ] Makefile to consolidate CI operations into a single source
  • (Optional) Decide where to uplaod and store artifacts from the builds -> new storage bucket in prod account w/ (very) restricted access.
  • Try out different machine sizes in GHA to find the most optimized $/time ratio.
  • Play out with cache options and tag groups
  • Testing caches:
    • WASM build
    • Docker build
    • Benchmarks
    • Tests and linters
  • Figure out something clever to upload benchmarks regularly and ping devs (upload to the benchmark job)
  • Try-runtime for runtime upgrades
  • PR cleanup
  • Review this PR with someone from Protocol (thanks @lemunozm and @wischli )
  • Change the PR checks for the new ones

@gpmayorga gpmayorga force-pushed the ci-rewrite-n-cache branch 4 times, most recently from dc4ecc0 to 54bc71f Compare September 15, 2023 04:19
Copy link
Contributor

@wischli wischli left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

AFAICT, there are some removable artifacts left.

.github/workflows/build-wasm.yml Outdated Show resolved Hide resolved
.github/workflows/build-wasm.yml Outdated Show resolved Hide resolved
.github/workflows/build-wasm.yml Outdated Show resolved Hide resolved
.github/workflows/build-wasm.yml Outdated Show resolved Hide resolved
.github/workflows/run-benchmarks.yml Outdated Show resolved Hide resolved
.github/workflows/sanity-checks.yml Outdated Show resolved Hide resolved
wischli
wischli previously approved these changes Nov 17, 2023
Copy link
Contributor

@wischli wischli left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Thanks so much for taking the time to pimp up our CI. Looking awesome and blazingly fast 🚀

@gpmayorga gpmayorga enabled auto-merge (squash) November 17, 2023 11:58
strategy:
matrix:
runtime: [development, altair, centrifuge]
runtime: [altair, centrifuge]
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Why not development?

Copy link
Contributor

@wischli wischli Nov 17, 2023

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I thought we didnt need to check dev benchmarks to save time. I know this kind of contradicts my argument of keeping benchmarks in Dev. However, in the past releases I merged Centrifuge benchmarks to Dev because there was no runtime benchmark pipeline. I think thats feasible because this way we mimic mainnet.

How about we iterate over that when we build the next feature, which will probably first only exist in Dev runtime?

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Since this are PR checks, let's keep it minimal, this won't actually publish or set benchmarks anywhere, it's just to check that the benchmarks run
Do we really need all 3 to run?

Comment on lines 5 to 6
.github/workflows @gpmayorga
.github/actions @gpmayorga
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

👌🏻

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

we could add @wischli too, for the bus factor

steps:
- name: Checkout repository
uses: actions/checkout@v3
uses: actions/checkout@3df4ab11eba7bda6032a0b82a6bb43b11571feac #v4
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Out of curious: Why pointing to a hash directly?

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

In the unlikely case that the actions/checkout repository introduces a bad update or let's say someone hacks it and introduces malicious code, our pipeline will run the update without questions if we do v1 vs the code commit. It's a way of freezing the code we run from others.
Keep in mind some of the CI pipelines have access to our Google Cloud credentials so a malicious change can potentially obtain access too.
More info: https://docs.github.com/en/actions/security-guides/security-hardening-for-github-actions#using-third-party-actions

runs-on: ubuntu-latest-8-cores
strategy:
matrix:
runtimes: [centrifuge, altair]
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I think now Demo uses development we should add also here the weights. See this slack thread

Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I would also add development here. But lets have this in a second PR where we should also add the possibility to build development wasm for upgrades in dev and demo.

strategy:
matrix:
target: [test-general, test-integration,
lint-fmt, lint-clippy, cargo-build] # ,lint-taplo]
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Should be list-taple reenable?

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Our intention was to re-enable it in a follow up PR in case it doesnt work out of the box. WDYT?

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Every time I enabled it it failed consistently, I think it requires a bit of work, I agree on dealing with it in separate PRs

echo "---- GENERATE CODE COVERAGE ----"
echo "# Install Tarpaulin"
cargo install --locked cargo-tarpaulin
# make Cargo.toml
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Can be this line removed?

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This file is not used right now but kept as a wip state to introduce code cov.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Yes, this will be a separate PR

wischli
wischli previously approved these changes Nov 17, 2023
Copy link
Contributor

@wischli wischli left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Re-approving. Unfortunately, my approval is not sufficient.

@gpmayorga
Copy link
Contributor Author

Copy link
Collaborator

@mustermeiszer mustermeiszer left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I will approve based on the knoweledge I have but more on the fact that @wischli and @gpmayorga spent so much time on this. Thanks a lot for the improvements!!!

runs-on: ubuntu-latest-8-cores
strategy:
matrix:
runtimes: [centrifuge, altair]
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I would also add development here. But lets have this in a second PR where we should also add the possibility to build development wasm for upgrades in dev and demo.

@gpmayorga gpmayorga merged commit 8d78e6f into main Nov 17, 2023
16 checks passed
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

4 participants