Skip to content

Add CI on linux-riscv64#13854

Draft
luhenry wants to merge 1 commit intok3s-io:mainfrom
luhenry:main
Draft

Add CI on linux-riscv64#13854
luhenry wants to merge 1 commit intok3s-io:mainfrom
luhenry:main

Conversation

@luhenry
Copy link
Copy Markdown

@luhenry luhenry commented Mar 24, 2026

Proposed Changes

Types of Changes

Verification

Testing

Linked Issues

User-Facing Change


Further Comments

cc @pgonin

@luhenry luhenry force-pushed the main branch 4 times, most recently from 4eea5de to 467b70f Compare March 24, 2026 23:34
@luhenry
Copy link
Copy Markdown
Author

luhenry commented Mar 25, 2026

Fixing missing gh cli at riseproject-dev/riscv-runner-images#17

@luhenry luhenry force-pushed the main branch 3 times, most recently from c8f28e0 to 26e0135 Compare March 25, 2026 16:17
@luhenry
Copy link
Copy Markdown
Author

luhenry commented Mar 26, 2026

Looking at the timing differences between build-arm64 and build-riscv64:

Overall: riscv64 is ~10-11x slower than arm64 across the board.

Key metrics:

  • Median per-package slowdown: 10.2x (very consistent)
  • p90: 15.5x, p99: 25.6x, max: 42x
  • Minimum: 1.5x, even the fastest riscv64 compiles are still slower

Top bottlenecks on riscv64:

Package arm64 riscv64 Ratio
runtime 6.2s 57.1s 9.2x
k8s.io/api/core/v1 4.5s 53.5s 12.0x
reflect 3.2s 36.6s 11.6x
net 2.5s 26.3s 10.3x
net/http 1.8s 22.9s 12.7x

Wall-clock for key build steps:

  • scripts/build: arm64=3m55s vs riscv64=40m28s (10.4x)
  • scripts/package-cli: arm64=22s vs riscv64=2m31s (6.7x)
  • docker buildx build Dockerfile.local: arm64=5m46s vs riscv64=48m32s (8.4x) — note these are cumulative buildkit step times

The slowdown is remarkably uniform at ~10-12x for compilation. The outliers (25-42x) tend to be tiny packages where the ratio is amplified by fixed overhead. The user time tracks closely with real time, confirming this is CPU-bound work, not I/O.

@luhenry
Copy link
Copy Markdown
Author

luhenry commented Mar 26, 2026

I'm also adding kubectl to the base image at riseproject-dev/riscv-runner-images#24

@luhenry
Copy link
Copy Markdown
Author

luhenry commented Mar 26, 2026

riscv64 CI Failure Classification

Run: https://github.com/luhenry/k3s/actions/runs/23554260675

Every failure is riscv64-only — all amd64 and arm64 jobs passed.

Category 1: Missing rancher/systemd-node:v0.0.8 image for riscv64 (7 jobs)

These fail because the rancher/systemd-node:v0.0.8 Docker image doesn't exist for riscv64:

Job Error
hardened Unable to find image 'rancher/systemd-node:v0.0.8'
autoimport Unable to find image 'rancher/systemd-node:v0.0.8'
snapshotrestore Unable to find image 'rancher/systemd-node:v0.0.8'
dualstack Unable to find image 'rancher/systemd-node:v0.0.8'
svcpoliciesandfirewall Unable to find image 'rancher/systemd-node:v0.0.8'
secretsencryption Unable to find image 'rancher/systemd-node:v0.0.8'
token Unable to find image 'rancher/systemd-node:v0.0.8'

Category 2: Missing rancher/k3s release images for riscv64 (2 jobs)

Job Error
skew Unable to find image 'rancher/k3s:v1.34.5-k3s1'
upgrade Unable to find image 'rancher/k3s:v1.35.2-k3s1'

These tests need prior K3s release images which aren't published for riscv64.

Category 3: Missing rancher/mirrored-pause:3.6 image for riscv64 (4 jobs)

The rancher/mirrored-pause:3.6 manifest has no riscv64 platform entry (no match for platform in manifest: not found). This prevents any pod sandbox from starting, which causes coredns (and all other pods) to never come up, ultimately hitting the test timeout.

Job Error
etcd Timed out after 120.000sfailed to pull image "rancher/mirrored-pause:3.6": no match for platform in manifest: not found
basics Timed out after 180.001sfailed to pull image "rancher/mirrored-pause:3.6": no match for platform in manifest: not found
bootstraptoken Timed out after 120.000sfailed to pull image "rancher/mirrored-pause:3.6": no match for platform in manifest: not found
cacerts failed to run command: docker run ... rancher/mirrored-pause:3.6, exit status 125

Category 5: Nix not supported on riscv64 (1 job)

Job Error
nixsnapshotter ArchOs (RISCV64-Linux) doesn't map to a supported Nix platform

Workaround: Disable nix jobs like on arm64

Category 6: Test setup issue — bind mount path doesn't exist (1 job)

Job Error
lazypull bind source path does not exist: /tmp/k3s-test-4264788357/server-0.yaml

Fix: riseproject-dev/riscv-runner-app@1fea2f7

Summary

Root Cause Jobs
Missing rancher/systemd-node:v0.0.8 riscv64 image 7
Missing rancher/mirrored-pause:3.6 riscv64 platform in manifest 4
Missing rancher/k3s prior release images for riscv64 2
Nix doesn't support riscv64 1
Test setup bug (config file not created before bind mount) 1

@shanduur
Copy link
Copy Markdown

From my observation, compiling riscv64 on arm64 is little bit faster than riscv64 on amd64.

@luhenry
Copy link
Copy Markdown
Author

luhenry commented Mar 26, 2026

From my observation, compiling riscv64 on arm64 is little bit faster than riscv64 on amd64.

@shanduur this is building on native RISC-V hardware, so no QEMU involved.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

2 participants