Nightly Valgrind Matrix #4421

reneme · 2024-10-30T08:36:34Z

This sets up a configuration matrix for the nightly Valgrind run. In the hope to generate a stronger signal from the CT::poison() helpers. For that, I introduced two new build targets 'valgrind-ct' and 'valgrind-ct-full'. Currently, as their only difference compared to the non-ct targets, they don't enable the --leak-check=full, --show-reachables and --track-origins Valgrind features.

Explicitly note that the clang -Os variant would have been able to detect #4107, as evident by the ct_selftest.py test "clang_vs_bare_metal_ct_mask... ok" which reproduced this issue. This particular self-test is skipped for all other configurations.

Below are some runtimes. Note that the compiliation results were cached in ccache (from previous test runs). So these runtimes are mostly comprised of runner setup, cached compilation and valgrind-hosted test suite runtime.

	GCC	clang
-O1	52min	63min
-O2	54min	59min
-O3	49min	50min
-Os	63min 🔸	59min

🔸, means that it ran a reduced set of tests (valgrind-ct)

The valgrind configuration matrix also contains the existing 'valgrind-full' run on clang with -O3. This is the only configuration that explicitly aims to find memory bugs and enables the --leak-check=full, --show-reachables and --track-origins Valgrind features.

Here are, now outdated, runtimes (before switching off --leak-check, --show-reachables, --track-origins) and compiling with a cold ccache:

	GCC	clang
-O0	❌	❌
-O1	1h35m	1h43m
-O2	1h34m	1h33m
-O3	1h30m	1h36m
-Os	2h16m 🔸	1h50m

❌, means that they failed in a timeout after 6 hours.
🔸, means that it timed out with valgrind-full but succeeded with valgrind

Closes #4396

randombit · 2024-10-30T14:35:10Z

I think the no optimizations case is not really that useful; if someone using using that in production they are insane since performance will be terrible, we quite explicitly rely on the compiler performing extensive inlining/etc.

Same logic applies to -O1, but there the performance seems tolerable (considering).

There are two completely distinct reasons we're running valgrind

Detecting memory errors/leaks, the normal valgrind stuff
To detect const time violations

We should distinguish these in the build

valgrind runs all tests, with our current set of valgrind flags (--leak-check=full, --show-reachable, --track-origins - recall that these add significant overhead to running valgrind), and using our standard compilation flags.
valgrind-ct (whatever) runs only specifically designated tests, and without the leak check/origin tracking flags, across a range of targets.

reneme · 2024-10-30T14:54:46Z

I think the no optimizations case is not really that useful; if someone using that in production they are insane since performance will be terrible.

I wanted to agree, but the -O0 did actually find something that may or may not be worth fixing: 67964a8

reneme · 2024-10-30T14:58:52Z

valgrind-ct (whatever) runs only specifically designated tests, and without the leak check/origin tracking flags, across a range of targets.

Given that the runtime is bearable (for a nightly job and excluding -O0), I am wondering whether it is really worth the effort of splitting these out. Perhaps to tweak the valgrind flags for the CT-matrix. But I'd put a question mark on the hand-picked tests, as we'll inevitably miss adding relevant things to this list in the future.

Without this, the self-test won't be able to determine when to run the regression test on a compiler-induced side channel under -Os.

Both clang and GCC generate a conditional jump on a secret-dependent value when compiling without any optimizations (-O0)! My best guess is that the conjunction of two bools is generated canonically *jumping* over the evaluation of the second opperand when the first was already found to be false. Optimizers will figure out that this can be implemented using an 'and' instruction for two bool operands.

These are particularly slow for -O0, hence we run only the smallest configuration in the 'valgrind' target and leave the others to 'valgrind-full'.

This aims to increase the signal of secret-dependent execution gained from the CT::poison() helpers by running these Valgrind-based tests across a range of compilers and optimization flags. For instance randombit#4107 would have been detectable using this matrix.

coveralls · 2024-10-31T14:16:35Z

coverage: 91.073%. remained the same
when pulling cdd6fc3 on Rohde-Schwarz:ci/valgrind_matrix
into 2ae990f on randombit:master.

randombit

Looks good thanks

reneme added the infra CI, package management, etc label Oct 30, 2024

reneme self-assigned this Oct 30, 2024

reneme force-pushed the ci/valgrind_matrix branch from 011d3fe to a2eaafc Compare October 30, 2024 13:15

reneme force-pushed the ci/valgrind_matrix branch from d270d43 to 0735ce1 Compare October 31, 2024 12:31

reneme added 5 commits October 31, 2024 14:39

Remove Valgrind flags from ct_selftest

9b4717a

FIX: pass the build config to ct_selftest.py

086ae44

Without this, the self-test won't be able to determine when to run the regression test on a compiler-induced side channel under -Os.

disable awfully-slow Dilithium/ML-DSA KAT tests for 'valgrind'

5e1fdc4

These are particularly slow for -O0, hence we run only the smallest configuration in the 'valgrind' target and leave the others to 'valgrind-full'.

reneme force-pushed the ci/valgrind_matrix branch from 0735ce1 to cdd6fc3 Compare October 31, 2024 13:44

reneme marked this pull request as ready for review October 31, 2024 13:53

reneme requested a review from randombit October 31, 2024 13:56

randombit approved these changes Oct 31, 2024

View reviewed changes

reneme merged commit 9917532 into randombit:master Nov 1, 2024
38 checks passed

reneme deleted the ci/valgrind_matrix branch November 1, 2024 05:50

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Nightly Valgrind Matrix #4421

Nightly Valgrind Matrix #4421

reneme commented Oct 30, 2024 •

edited

Loading

randombit commented Oct 30, 2024

reneme commented Oct 30, 2024

reneme commented Oct 30, 2024

coveralls commented Oct 31, 2024

randombit left a comment

Nightly Valgrind Matrix #4421

Nightly Valgrind Matrix #4421

Conversation

reneme commented Oct 30, 2024 • edited Loading

randombit commented Oct 30, 2024

reneme commented Oct 30, 2024

reneme commented Oct 30, 2024

coveralls commented Oct 31, 2024

randombit left a comment

Choose a reason for hiding this comment

reneme commented Oct 30, 2024 •

edited

Loading