Skip to content

Conversation

@nathanielsimard
Copy link
Member

@nathanielsimard nathanielsimard commented Nov 10, 2025

Requires: tracel-ai/cubecl#1021

Fixes #4004

@laggui laggui changed the title Set rev Fix async barrier & TMA checks Nov 10, 2025
@laggui laggui added the ci:test-gpu When applied to a Pull Request execute the `test-gpu.yml` workflow. label Nov 10, 2025
@bot-ember
Copy link
Member

🏷️ Workflow test-gpu enabled 🟢
🕒 Auto-run scheduled on updates
ℹ A run has been scheduled... please wait for the results

@codecov
Copy link

codecov bot commented Nov 10, 2025

Codecov Report

✅ All modified and coverable lines are covered by tests.
✅ Project coverage is 65.35%. Comparing base (cc3ee1e) to head (ec69860).
⚠️ Report is 2 commits behind head on main.

Additional details and impacted files
@@           Coverage Diff           @@
##             main    #4007   +/-   ##
=======================================
  Coverage   65.35%   65.35%           
=======================================
  Files        1183     1183           
  Lines      139391   139391           
=======================================
  Hits        91095    91095           
  Misses      48296    48296           

☔ View full report in Codecov by Sentry.
📢 Have feedback on the report? Share it here.

🚀 New features to boost your workflow:
  • ❄️ Test Analytics: Detect flaky tests, report on failures, and find test suite problems.

@bot-ember
Copy link
Member

❌ Workflow CI GPU:#4007 (run #349) failure!

@bot-ember
Copy link
Member

🚀️ Workflow CI GPU (run #350) started!

@bot-ember
Copy link
Member

❌ Workflow CI GPU:#4007 (run #350) failure!

@laggui
Copy link
Member

laggui commented Nov 10, 2025

This fixed all CUDA errors for TMA and vulkan issues with async barrier.

Only one autotune failure remains on vulkan:

  thread 'tests::cube::tensor::f32_ty::module_conv2d::tests::test_conv2d_binary_broadcasted' (9434) panicked at /home/agent/.cargo/git/checkouts/cubecl-058c47895211d464/29f4afa/crates/cubecl-runtime/src/tune/tuner.rs:344:17:
  No autotune was flagged as valid for the problem.

Not sure why though.

@nathanielsimard nathanielsimard merged commit 06be7b2 into main Nov 10, 2025
13 of 14 checks passed
@nathanielsimard nathanielsimard deleted the disable/mma/amd branch November 10, 2025 19:15
@nathanielsimard
Copy link
Member Author

This fixed all CUDA errors for TMA and vulkan issues with async barrier.

Only one autotune failure remains on vulkan:

  thread 'tests::cube::tensor::f32_ty::module_conv2d::tests::test_conv2d_binary_broadcasted' (9434) panicked at /home/agent/.cargo/git/checkouts/cubecl-058c47895211d464/29f4afa/crates/cubecl-runtime/src/tune/tuner.rs:344:17:
  No autotune was flagged as valid for the problem.

Not sure why though.

We should fix it in a new PR

khoek pushed a commit to khoek/burn that referenced this pull request Nov 27, 2025
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

ci:test-gpu When applied to a Pull Request execute the `test-gpu.yml` workflow.

Projects

None yet

Development

Successfully merging this pull request may close these issues.

After this PR: https://github.com/tracel-ai/burn/pull/3986, running with the CUDA runtime on the 4090D will crash.

4 participants