Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

cuda CI has stopped building #309

Open
tfoote opened this issue Jan 25, 2025 · 0 comments
Open

cuda CI has stopped building #309

tfoote opened this issue Jan 25, 2025 · 0 comments
Labels
bug Something isn't working help wanted Extra attention is needed nvidia Connected to the nvidia extension

Comments

@tfoote
Copy link
Collaborator

tfoote commented Jan 25, 2025

This is causing CI to start failing. 0.2.18 which passed last week failed now when I reran the action. The docker build seems to fail immediately after the installation of cuda-toolkit. My current speculation is that it's hitting a maximum layer size and failing to save so the build can't continue.

I'm going to disable the cuda on CI as it both is very big and has stopped working. And in addition it's quite slow because of the ~9GB of content just to install the toolkit. Which is very close to the max upload size

https://docs.github.com/en/packages/working-with-a-github-packages-registry/working-with-the-container-registry#troubleshooting

The Container registry has a 10 GB size limit for each layer.
The Container registry has a 10 minute timeout limit for uploads.

@tfoote tfoote added enhancement New feature or request help wanted Extra attention is needed nvidia Connected to the nvidia extension labels Jan 25, 2025
tfoote added a commit that referenced this issue Jan 25, 2025
I'm not sure if this is a good thing to add it's undocumented and potentially fragile.
I was trying it when looking at #309 for a root cause, but determined that the error was actually the build failing. So the image id detection wasn't necessary.

The canonicalization though may be valuable to merge w/o the aux reading logic.
tfoote added a commit that referenced this issue Jan 25, 2025
I'm not sure if this is a good thing to add it's undocumented and potentially fragile.
I was trying it when looking at #309 for a root cause, but determined that the error was actually the build failing. So the image id detection wasn't necessary.

The canonicalization though may be valuable to merge w/o the aux reading logic.
@tfoote tfoote added bug Something isn't working and removed enhancement New feature or request labels Jan 26, 2025
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
bug Something isn't working help wanted Extra attention is needed nvidia Connected to the nvidia extension
Projects
None yet
Development

No branches or pull requests

1 participant