Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Nightly Valgrind Matrix #4421

Merged
merged 5 commits into from
Nov 1, 2024
Merged

Conversation

reneme
Copy link
Collaborator

@reneme reneme commented Oct 30, 2024

This sets up a configuration matrix for the nightly Valgrind run. In the hope to generate a stronger signal from the CT::poison() helpers. For that, I introduced two new build targets 'valgrind-ct' and 'valgrind-ct-full'. Currently, as their only difference compared to the non-ct targets, they don't enable the --leak-check=full, --show-reachables and --track-origins Valgrind features.

Explicitly note that the clang -Os variant would have been able to detect #4107, as evident by the ct_selftest.py test "clang_vs_bare_metal_ct_mask... ok" which reproduced this issue. This particular self-test is skipped for all other configurations.

Below are some runtimes. Note that the compiliation results were cached in ccache (from previous test runs). So these runtimes are mostly comprised of runner setup, cached compilation and valgrind-hosted test suite runtime.

GCC clang
-O1 52min 63min
-O2 54min 59min
-O3 49min 50min
-Os 63min 🔸 59min

🔸, means that it ran a reduced set of tests (valgrind-ct)

The valgrind configuration matrix also contains the existing 'valgrind-full' run on clang with -O3. This is the only configuration that explicitly aims to find memory bugs and enables the --leak-check=full, --show-reachables and --track-origins Valgrind features.


Here are, now outdated, runtimes (before switching off --leak-check, --show-reachables, --track-origins) and compiling with a cold ccache:

GCC clang
-O0
-O1 1h35m 1h43m
-O2 1h34m 1h33m
-O3 1h30m 1h36m
-Os 2h16m 🔸 1h50m

❌, means that they failed in a timeout after 6 hours.
🔸, means that it timed out with valgrind-full but succeeded with valgrind

Closes #4396

@reneme reneme added the infra CI, package management, etc label Oct 30, 2024
@reneme reneme self-assigned this Oct 30, 2024
@randombit
Copy link
Owner

I think the no optimizations case is not really that useful; if someone using using that in production they are insane since performance will be terrible, we quite explicitly rely on the compiler performing extensive inlining/etc.

Same logic applies to -O1, but there the performance seems tolerable (considering).

There are two completely distinct reasons we're running valgrind

  • Detecting memory errors/leaks, the normal valgrind stuff
  • To detect const time violations

We should distinguish these in the build

  • valgrind runs all tests, with our current set of valgrind flags (--leak-check=full, --show-reachable, --track-origins - recall that these add significant overhead to running valgrind), and using our standard compilation flags.
  • valgrind-ct (whatever) runs only specifically designated tests, and without the leak check/origin tracking flags, across a range of targets.

@reneme
Copy link
Collaborator Author

reneme commented Oct 30, 2024

I think the no optimizations case is not really that useful; if someone using that in production they are insane since performance will be terrible.

I wanted to agree, but the -O0 did actually find something that may or may not be worth fixing: 67964a8

@reneme
Copy link
Collaborator Author

reneme commented Oct 30, 2024

valgrind-ct (whatever) runs only specifically designated tests, and without the leak check/origin tracking flags, across a range of targets.

Given that the runtime is bearable (for a nightly job and excluding -O0), I am wondering whether it is really worth the effort of splitting these out. Perhaps to tweak the valgrind flags for the CT-matrix. But I'd put a question mark on the hand-picked tests, as we'll inevitably miss adding relevant things to this list in the future.

Without this, the self-test won't be able to determine when
to run the regression test on a compiler-induced side channel
under -Os.
Both clang and GCC generate a conditional jump on a secret-dependent
value when compiling without any optimizations (-O0)!
My best guess is that the conjunction of two bools is generated
canonically *jumping* over the evaluation of the second opperand when
the first was already found to be false. Optimizers will figure out
that this can be implemented using an 'and' instruction for two bool
operands.
These are particularly slow for -O0, hence we run only the smallest
configuration in the 'valgrind' target and leave the others to
'valgrind-full'.
This aims to increase the signal of secret-dependent execution gained
from the CT::poison() helpers by running these Valgrind-based tests
across a range of compilers and optimization flags.

For instance randombit#4107 would have been detectable using this matrix.
@reneme reneme marked this pull request as ready for review October 31, 2024 13:53
@reneme reneme requested a review from randombit October 31, 2024 13:56
@coveralls
Copy link

Coverage Status

coverage: 91.073%. remained the same
when pulling cdd6fc3 on Rohde-Schwarz:ci/valgrind_matrix
into 2ae990f on randombit:master.

Copy link
Owner

@randombit randombit left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Looks good thanks

@reneme reneme merged commit 9917532 into randombit:master Nov 1, 2024
38 checks passed
@reneme reneme deleted the ci/valgrind_matrix branch November 1, 2024 05:50
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
infra CI, package management, etc
Projects
None yet
Development

Successfully merging this pull request may close these issues.

Set up a nightly build matrix for valgrind-based SCA
3 participants