Verification failures are printed in label order, making investigating failures difficult #632

Alan-Jowett · 2024-05-10T15:58:36Z

Failures in large eBPF programs with complex control flow can be hard to analyze. Of particular concern is that an assertion failure can result in invariants being removed from the post-invariant set, which causes a cascade of subsequent assertion failures which strip invariants.

Locating the initial invariant violation that triggers this cascade of failures can be challenging assuming the failures are printed in label order (as labels don't necessarily match the control flow).

One possible solution is to provide an option to print errors in WTO (weak topographical order), which will tend to place initial failures earlier in the list of failures.

A recent case in point is a verification failure in a Cilium program, where a single correlated branch failure triggers around 500 subsequent failures. Because of significant inlining and branch folding by LLVM, the initial failure was present in the middle of the list of 500 failures making it difficult to locate.

When sorting the errors in WTO, the failure is close to the top of the list.

elazarg · 2024-05-11T18:16:20Z

The flag --assume-assert is designed to help in this exact case. Doesn't it?

elazarg · 2024-05-22T16:04:54Z

@Alan-Jowett did you experiment with that flag?

Alan-Jowett · 2024-05-22T16:08:03Z

Thanks for the feedback. What does this flag do? Should it be on by default when performing the analysis? I will try it out.

elazarg · 2024-05-22T16:14:38Z

The flag forces the invariants to match the assertions, even if they did not. So errors are not repeated and not compounded.

It used to be the default. I don't remember the reasoning behind disabling it, but in part it's because expected failures in tests stress the code less (since many invariants drop to bottom immediately).

Please let me know if the flag helps.

Alan-Jowett · 2024-05-22T16:25:59Z

That flag is awesome!

From 507 errors -> 1 error, and it's the actual relevant error that is causing the problem. This is even better than my approach. Is there any harm in making this the default in eBPF-for-Window's calls to the verifier?

I will go ahead and close this PR unless you see value in it.

elazarg · 2024-05-22T16:28:39Z

I think it should be on by default during debugging, and off by default in tests.

Alan-Jowett · 2024-05-22T16:30:22Z

Thanks!

Alan-Jowett mentioned this issue May 10, 2024

Print failures in wto for easier debugging #633

Closed

Alan-Jowett closed this as completed May 22, 2024

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Verification failures are printed in label order, making investigating failures difficult #632

Verification failures are printed in label order, making investigating failures difficult #632

Alan-Jowett commented May 10, 2024

elazarg commented May 11, 2024 •

edited

elazarg commented May 22, 2024

Alan-Jowett commented May 22, 2024 •

edited

elazarg commented May 22, 2024 •

edited

Alan-Jowett commented May 22, 2024

elazarg commented May 22, 2024

Alan-Jowett commented May 22, 2024

Verification failures are printed in label order, making investigating failures difficult #632

Verification failures are printed in label order, making investigating failures difficult #632

Comments

Alan-Jowett commented May 10, 2024

elazarg commented May 11, 2024 • edited

elazarg commented May 22, 2024

Alan-Jowett commented May 22, 2024 • edited

elazarg commented May 22, 2024 • edited

Alan-Jowett commented May 22, 2024

elazarg commented May 22, 2024

Alan-Jowett commented May 22, 2024

elazarg commented May 11, 2024 •

edited

Alan-Jowett commented May 22, 2024 •

edited

elazarg commented May 22, 2024 •

edited