Replace volatiles with atomic for SMP tests #134

Indanz · 2025-02-26T12:43:22Z

Let's see if this fixes the recent new failures and the old ones for the disabled CI tests.

Would be nice to test this with seL4/ci-actions#379, but no idea how to do that.

Indanz · 2025-02-26T20:53:51Z

Probably need to add -mno-outline-atomics to the build flags for sel4test, but I don't know how to do that with cmake. With clang 21 I'm getting:

/usr/bin/aarch64-linux-gnu-ld: /usr/lib/gcc-cross/aarch64-linux-gnu/10/libgcc.a(lse-init.o): in function init_have_lse_atomics': (.text.startup+0xc): undefined reference to __getauxval'

Would also be good to at least have -O1 or -O2, currently it's generating ridiculous assembly for debug builds.

Edit: Also need to manually set CMAKE_OBJCOPY, otherwise it will try to use the wrong one.

lsf37 · 2025-02-26T22:09:57Z

Would be nice to test this with seL4/ci-actions#379, but no idea how to do that.

Will set that up.

kent-mcleod · 2025-02-27T01:13:40Z

apps/sel4test-tests/src/tests/schedcontext.c

+void sched_context_0014_helper_fn(_Atomic int *state)
+{
+    while (1) {
+        *state = *state + 1;


Do you want this or *state += 1? The difference is whether the compiler generates either a load acquire and a separate store release or it generates an atomic increment I think (From some aarch64 compiler tests).

It could really be *state = 1 here. I wouldn't expect the compiler to generate different code though, as += 1 is just a shorthand.

If doing the increment, calling seL4_Yield() first would make the test more robust I think. So I think I'll either change it to = 1 or add the yield.

I wouldn't expect the compiler to generate different code though, as += 1 is just a shorthand.

I was looking at what _Atomic actually does to an int and an int* and on my aarch64 compiler (12.2.0), a = a + 1; generated different atomic intrinsic than a += 1. The first being a separate load acquire/store release, and the second was a looping load exclusive/store conditional

Was that with the actual build? For debug build there is no optimisation at all enabled so the generated assembly is horrible. But for optimised builds I would expect the same code, though neither gcc nor clang do:

https://godbolt.org/z/qbzxqPYTz

There's some subtle difference between the two that eludes me. For some reason neither generate atomic instructions for *num = *num + 1;.

Edit: I guess C gives stronger guarantees about ++ and += 1 than for separate reads and writes on atomics. That's nice to know, but very subtle...

Your godbolt link is what I was seeing too. += and ++ with _Atomic are treated as a __atomic_fetch_add() whereas a = a + 1 is treated as a sequence of __atomic_load_n -> add 1 -> __atomic_store_n. So there's ordering but the fetch and add and write back isn't atomic.

ldar w8, [x0] add w8, w8, #1 stlr w8, [x0]

ldar and stlr are "atomics" in that they're __atomic_load_n(num, __ATOMIC_ACQUIRE) and __atomic_store_n(num, val, __ATOMIC_RELEASE)

Ah, thanks, I missed that ldar and stlr have acquire/release semantics. As long as the store is release it shouldn't really matter for the test though.

kent-mcleod · 2025-02-27T03:36:16Z

Probably need to add -mno-outline-atomics to the build flags for sel4test, but I don't know how to do that with cmake. With clang 21 I'm getting:

It's already added for user level builds here: https://github.com/seL4/seL4_tools/blob/master/cmake-tool/helpers/environment_flags.cmake#L68

So you could change the condition to make it not conditional on a specific compiler version.

lsf37 · 2025-02-27T08:32:31Z

SCHED_CONTEXT_0014 is still there, but the other failures seem to be fixed.

Indanz · 2025-02-27T10:38:41Z

Probably need to add -mno-outline-atomics to the build flags for sel4test, but I don't know how to do that with cmake. With clang 21 I'm getting:

It's already added for user level builds here: https://github.com/seL4/seL4_tools/blob/master/cmake-tool/helpers/environment_flags.cmake#L68

Thanks, I'll add it there.

So you could change the condition to make it not conditional on a specific compiler version.

Then you get compile errors for unknown options, so we need some checks.

Indanz · 2025-02-27T10:47:23Z

Made the loop exactly the same as for the other code (so following Kent's suggestion basically), let's see if that helps.

lsf37 · 2025-02-27T21:28:46Z

Looks like we need to merge #135 first. @kent-mcleod, can that go in independently from the other ones?

kent-mcleod · 2025-02-27T21:35:30Z

Looks like we need to merge #135 first. @kent-mcleod, can that go in independently from the other ones?

No, this seems to be testing on top of an old commit of the kernel that doesn't have the change in it yet:

   sel4test-manifest: b6a33334 Updating default.xml                           (origin/master)
                seL4: 81679f2b SMP: Fix IPI Race                              (pull/871/head)
            musllibc: 9798aedb Makefile: update config.mak generation         (seL4/sel4)
           seL4_libs: 54539a72 Bump minimum CMake version                     (seL4/master)
  sel4_projects_libs: bca46738 libsel4vm: Proper error for map_vm_memory_res  (seL4/master)
         sel4runtime: fb26509f Arm: Support KernelArmTLSReg                   (seL4/master)

Why is it targeting #871 and not seL4/master?

Indanz · 2025-02-27T21:59:49Z

Why is it targeting #871 and not seL4/master?

I had a test-with in my comment, but removed it today.

kent-mcleod · 2025-02-27T22:20:00Z

I had a test-with in my comment, but removed it today.

oh right, thanks for the explaination. I thought there was some extra magic that I didn't know about happening.

Add the right TLS variable settings for the gnu-elf-abi used by our compilers. Signed-off-by: Kent McLeod <[email protected]>

Indanz · 2025-02-28T11:19:08Z

SCHED_CONTEXT_0014 might be a real problem, it's still there. Will need to look at generated assembly to be sure.

Indanz · 2025-03-01T10:05:01Z

After running the test multiple times, all failures are a combination of *_SMP_hyp_MCS_clang_64. That's too much of a coincidence to not be a real bug somewhere. Same as @lsf37's results before actually, but I missed that they were all HYP configs.

The first run SERSERV_CLI_PROC_005 failed for TX2_verification_SMP_hyp_MCS_gcc_64, that's something else to keep an eye on, but probably independent.

Signed-off-by: Indan Zupancic <[email protected]>

lsf37 · 2025-03-01T23:10:58Z

How about we merge this one, since it does improve the other tests, and I enable those other tests in CI while we are looking more closely at SCHED_CONTEXT_0014.

I have not seen SERSERV_CLI_PROC_005 fail before, that's definitely something to keep an eye on. As you says, it does not look like it is because of something in here, though.

Indanz · 2025-03-02T00:31:58Z

I did a lot of testing with tx2. Conclusion is that it's all caused by a task having MIN_BUDGET budget not making forward progress.

The test succeeds if the helper task manages to increment the counter within its budget. If it doesn't, it will hang forever and so will the whole test.

The reason it fails on some configs and not on others seems to be purely timing dependent: Some configs have more overhead and cache pressure than others.

seL4_DebugDumpScheduler() confirms my theory, the helper thread is stuck in the release queue forever at the same instruction. However, the reason why it doesn't make progress is surprising:

It's because the helper_thread function uses a floating point instruction (movi) which triggers the lazy FPU loading trap. After the trap the task doesn't have enough budget left, so it gets put into the release queue and the FPU gets disabled again. This will repeat forever.

(Edit: tx1 -> tx2)

lsf37 · 2025-03-02T21:46:58Z

Interesting. So that means that MIN_BUDGET is set too small for that config, right?

start_helper would do unnecessary complicated thread startup code which may contain FPU instructions with some compilers and configs. Normally this would be harmless (though it would thwart the test's intention), but if the task runs out of budget before it finishes the FPU instruction, it will be put on the release queue and make no forward progress (unclear why). This is only a problem for tasks with budget equal to MIN_BUDGET, any time lost because of syscalls or traps will cause the task to hang. Signed-off-by: Indan Zupancic <[email protected]>

Indanz · 2025-03-02T22:38:01Z

My theory above is wrong, lazy FPU loading probably isn't to blame, as it doesn't do a budget check or schedule. And if I add printfs to the FPU fault handling, it doesn't show up at all when the task hangs. However, my latest push without the movi works fine, but if I add asm volatile("movi v0.2d, #0x0"); just to before the while loop in the helper, it hangs without any prints.

All in all it still makes no sense and might be some subtle kernel bug after all, I don't know anymore.

Interesting. So that means that MIN_BUDGET is set too small for that config, right?

Maybe, I'm not sure. The helper task doesn't do any syscalls at all. Biggest interrupt latency I measured was 2 us, not counting IPIs, But no IPIs are sent after the helper is started, except if the test finishes and everything works.

I'm quite mystified at this point. I thought I knew what was going on yesterday, except I can't verify my theory and code review proofs me wrong.

How about we merge this one, since it does improve the other tests, and I enable those other tests in CI while we are looking more closely at SCHED_CONTEXT_0014.

With the latest commit it should work fine on all boards I think.

I have not seen SERSERV_CLI_PROC_005 fail before, that's definitely something to keep an eye on. As you says, it does not look like it is because of something in here, though.

I've seen one or the other SERSERV tests fail here and there recently, both before and after my SMP commits.

Sneaked in when style checking was temporarily broken. Signed-off-by: Indan Zupancic <[email protected]>

lsf37 · 2025-03-03T06:52:06Z

How about we merge this one, since it does improve the other tests, and I enable those other tests in CI while we are looking more closely at SCHED_CONTEXT_0014.

With the latest commit it should work fine on all boards I think.

Almost: the Odroid C4 still has SCHED_CONTEXT_0014 failures, there rest is fine:
https://github.com/seL4/sel4test/actions/runs/13620049487/job/38075368124?pr=134

Indanz · 2025-03-03T10:04:16Z

Almost: the Odroid C4 still has SCHED_CONTEXT_0014 failures, there rest is fine: https://github.com/seL4/sel4test/actions/runs/13620049487/job/38075368124?pr=134

Indeed, FAILED tests:

ODROID_C4_debug_SMP_MCS_gcc_64
ODROID_C4_debug_SMP_hyp_MCS_gcc_64
ODROID_C4_verification_SMP_MCS_gcc_64
ODROID_C4_verification_SMP_hyp_MCS_gcc_64

At least it's also non-hyp and gcc in there. Probably should rerun the tests to see if anything else fails sometimes. I'll focus my attention on the Odroic C4 now.

Indanz · 2025-03-05T23:10:31Z

Oops, merged it with smp-tests instead of master, not sure how to fix that.

lsf37 · 2025-03-06T05:00:18Z

Oops, merged it with smp-tests instead of master, not sure how to fix that.

I'll fix it on the smp-test branch (dropping that commit). We can then merge that on master. One sec.

lsf37 · 2025-03-06T05:10:07Z

Now in #136

Indanz added the hw-test set to run sel4test hardware test for this PR label Feb 26, 2025

Indanz force-pushed the volatile2atomic branch from c246e20 to efbe56a Compare February 26, 2025 20:28

Indanz added hw-build set to run all sel4test hardware builds on this PR and removed hw-test set to run sel4test hardware test for this PR labels Feb 26, 2025

Indanz mentioned this pull request Feb 26, 2025

SMP: Improve kernel lock seL4/seL4#871

Open

Indanz added the hw-test set to run sel4test hardware test for this PR label Feb 26, 2025

lsf37 changed the base branch from master to smp-tests February 26, 2025 22:22

lsf37 added hw-test set to run sel4test hardware test for this PR and removed hw-build set to run all sel4test hardware builds on this PR hw-test set to run sel4test hardware test for this PR labels Feb 26, 2025

kent-mcleod approved these changes Feb 27, 2025

View reviewed changes

kent-mcleod reviewed Feb 27, 2025

View reviewed changes

lsf37 force-pushed the smp-tests branch from 939e4c7 to 3583555 Compare February 27, 2025 01:36

Indanz force-pushed the volatile2atomic branch from efbe56a to d17d1c9 Compare February 27, 2025 10:46

Indanz mentioned this pull request Feb 27, 2025

Test MULTICORE0005 fails on SABRE sporadically #66

Closed

Indanz added hw-build set to run all sel4test hardware builds on this PR and removed hw-test set to run sel4test hardware test for this PR labels Feb 27, 2025

Indanz added hw-test set to run sel4test hardware test for this PR and removed hw-build set to run all sel4test hardware builds on this PR labels Feb 27, 2025

Indanz mentioned this pull request Feb 27, 2025

Clang: Add -mno-outline-atomics seL4/seL4_tools#221

Merged

arm,cmake: Set compiler expected tls settings

73ed59a

Add the right TLS variable settings for the gnu-elf-abi used by our compilers. Signed-off-by: Kent McLeod <[email protected]>

Replace volatiles with atomic for SMP tests

916ecab

Signed-off-by: Indan Zupancic <[email protected]>

Indanz force-pushed the volatile2atomic branch 2 times, most recently from cdc596b to 916ecab Compare March 1, 2025 16:59

lsf37 approved these changes Mar 1, 2025

View reviewed changes

Fix style errors

2438d2f

Sneaked in when style checking was temporarily broken. Signed-off-by: Indan Zupancic <[email protected]>

Indanz added hw-test set to run sel4test hardware test for this PR and removed hw-test set to run sel4test hardware test for this PR labels Mar 2, 2025

lsf37 added hw-test set to run sel4test hardware test for this PR and removed hw-test set to run sel4test hardware test for this PR labels Mar 2, 2025

Indanz mentioned this pull request Mar 5, 2025

Unexpected behaviour with minimum budget seL4/microkit#285

Open

Indanz merged commit 184d5e4 into seL4:smp-tests Mar 5, 2025
49 of 81 checks passed

lsf37 mentioned this pull request Mar 6, 2025

merge #134 into master #136

Merged

lsf37 mentioned this pull request Mar 7, 2025

ODROID_C4 fails SCHED_CONTEXT_0014 on MCS+SMP seL4/seL4#928

Open

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Replace volatiles with atomic for SMP tests #134

Replace volatiles with atomic for SMP tests #134

Indanz commented Feb 26, 2025 •

edited

Loading

Indanz commented Feb 26, 2025 •

edited

Loading

lsf37 commented Feb 26, 2025

kent-mcleod Feb 27, 2025

Indanz Feb 27, 2025

kent-mcleod Feb 27, 2025

Indanz Feb 27, 2025 •

edited

Loading

kent-mcleod Feb 27, 2025

Indanz Feb 27, 2025

kent-mcleod commented Feb 27, 2025

lsf37 commented Feb 27, 2025

Indanz commented Feb 27, 2025

Indanz commented Feb 27, 2025

lsf37 commented Feb 27, 2025

kent-mcleod commented Feb 27, 2025

Indanz commented Feb 27, 2025

kent-mcleod commented Feb 27, 2025

Indanz commented Feb 28, 2025

Indanz commented Mar 1, 2025

lsf37 commented Mar 1, 2025

Indanz commented Mar 2, 2025 •

edited

Loading

lsf37 commented Mar 2, 2025

Indanz commented Mar 2, 2025

lsf37 commented Mar 3, 2025

Indanz commented Mar 3, 2025

Indanz commented Mar 5, 2025

lsf37 commented Mar 6, 2025

lsf37 commented Mar 6, 2025

Replace volatiles with atomic for SMP tests #134

Replace volatiles with atomic for SMP tests #134

Conversation

Indanz commented Feb 26, 2025 • edited Loading

Indanz commented Feb 26, 2025 • edited Loading

lsf37 commented Feb 26, 2025

kent-mcleod Feb 27, 2025

Choose a reason for hiding this comment

Indanz Feb 27, 2025

Choose a reason for hiding this comment

kent-mcleod Feb 27, 2025

Choose a reason for hiding this comment

Indanz Feb 27, 2025 • edited Loading

Choose a reason for hiding this comment

kent-mcleod Feb 27, 2025

Choose a reason for hiding this comment

Indanz Feb 27, 2025

Choose a reason for hiding this comment

kent-mcleod commented Feb 27, 2025

lsf37 commented Feb 27, 2025

Indanz commented Feb 27, 2025

Indanz commented Feb 27, 2025

lsf37 commented Feb 27, 2025

kent-mcleod commented Feb 27, 2025

Indanz commented Feb 27, 2025

kent-mcleod commented Feb 27, 2025

Indanz commented Feb 28, 2025

Indanz commented Mar 1, 2025

lsf37 commented Mar 1, 2025

Indanz commented Mar 2, 2025 • edited Loading

lsf37 commented Mar 2, 2025

Indanz commented Mar 2, 2025

lsf37 commented Mar 3, 2025

Indanz commented Mar 3, 2025

Indanz commented Mar 5, 2025

lsf37 commented Mar 6, 2025

lsf37 commented Mar 6, 2025

Indanz commented Feb 26, 2025 •

edited

Loading

Indanz commented Feb 26, 2025 •

edited

Loading

Indanz Feb 27, 2025 •

edited

Loading

Indanz commented Mar 2, 2025 •

edited

Loading