Minimal unwinding information (DWARF CFI) checker #145633

amsen20 · 2025-06-25T04:47:29Z

This PR adds a minimal version of UnwindInfoChecker described in here.

This implementation looks into the modified registers by each instruction and checks:

If one of them is the CFA register, and it informs the user if the CFA data is not modified as well.
If one of them is used in another register's unwinding rule, it informs the user if the unwind info is not modified after this instruction.

This implementation does not support DWARF expressions and treats them as unknown unwinding rules.

…mple CFI analysis

…/Write conditions

- Changing the register and not informing it using cfi directives - Changing the CFA and report it wrong - Load a register from another register stored location

…mer.h

…d `FunctionUnitAnalyzer`. Define `FunctionUnitUnwindInfoAnalyzer` that executes the Analysis during the stream. The main reasons for doing this is: - This is compatible with other use cases of `MCStreamer` implementers (e.g. `MCAsmStreamer` and `MCAssembler`). - This approach result in on the fly analysis, if it was implemented like `RecordStreamer`, then all the instruction of the whole program have to be copied and be stored in the memory.

amsen20 · 2025-06-25T04:48:30Z

Please help me in:

Breaking this implementation in re-viewable parts
Testing each part of the implementation properly

It's my first non-mechanical PR, so any suggestion, even little is appreciated.

github-actions · 2025-06-25T04:51:07Z

⚠️ C/C++ code formatter, clang-format found issues in your code. ⚠️

You can test this locally with the following command:

git-clang-format --diff HEAD~1 HEAD --extensions cpp,h -- llvm/include/llvm/UnwindInfoChecker/FunctionUnitAnalyzer.h llvm/include/llvm/UnwindInfoChecker/FunctionUnitStreamer.h llvm/include/llvm/UnwindInfoChecker/FunctionUnitUnwindInfoAnalyzer.h llvm/include/llvm/UnwindInfoChecker/UnwindInfoAnalysis.h llvm/include/llvm/UnwindInfoChecker/UnwindInfoState.h llvm/lib/UnwindInfoChecker/FunctionUnitAnalyzer.cpp llvm/lib/UnwindInfoChecker/FunctionUnitStreamer.cpp llvm/lib/UnwindInfoChecker/FunctionUnitUnwindInfoAnalyzer.cpp llvm/lib/UnwindInfoChecker/Registers.h llvm/lib/UnwindInfoChecker/UnwindInfoAnalysis.cpp llvm/lib/UnwindInfoChecker/UnwindInfoState.cpp llvm/include/llvm/DebugInfo/DWARF/DWARFCFIProgram.h llvm/include/llvm/DebugInfo/DWARF/DWARFDebugFrame.h llvm/tools/llvm-mc/llvm-mc.cpp

View the diff from clang-format here.

diff --git a/llvm/lib/UnwindInfoChecker/FunctionUnitStreamer.cpp b/llvm/lib/UnwindInfoChecker/FunctionUnitStreamer.cpp
index 54389aa7f..775ca35eb 100644
--- a/llvm/lib/UnwindInfoChecker/FunctionUnitStreamer.cpp
+++ b/llvm/lib/UnwindInfoChecker/FunctionUnitStreamer.cpp
@@ -17,7 +17,8 @@
 
 using namespace llvm;
 
-std::pair<unsigned, unsigned> CFIFunctionFrameStreamer::updateDirectivesRange() {
+std::pair<unsigned, unsigned>
+CFIFunctionFrameStreamer::updateDirectivesRange() {
   auto Frames = getDwarfFrameInfos();
   unsigned CurrentCFIDirectiveIndex = 0;
   if (hasUnfinishedDwarfFrameInfo()) {
@@ -42,7 +43,8 @@ void CFIFunctionFrameStreamer::updateAnalyzer() {
     return;
   }
 
-  const MCDwarfFrameInfo *LastFrame = &getDwarfFrameInfos()[FrameIndices.back()];
+  const MCDwarfFrameInfo *LastFrame =
+      &getDwarfFrameInfos()[FrameIndices.back()];
 
   auto DirectivesRange = updateDirectivesRange();
   ArrayRef<MCCFIInstruction> Directives;
@@ -62,7 +64,7 @@ void CFIFunctionFrameStreamer::updateAnalyzer() {
 }
 
 void CFIFunctionFrameStreamer::emitInstruction(const MCInst &Inst,
-                                           const MCSubtargetInfo &STI) {
+                                               const MCSubtargetInfo &STI) {
   updateAnalyzer();
   LastInstruction = Inst;
 }
diff --git a/llvm/lib/UnwindInfoChecker/UnwindInfoAnalysis.cpp b/llvm/lib/UnwindInfoChecker/UnwindInfoAnalysis.cpp
index 872793015..ab615fbe6 100644
--- a/llvm/lib/UnwindInfoChecker/UnwindInfoAnalysis.cpp
+++ b/llvm/lib/UnwindInfoChecker/UnwindInfoAnalysis.cpp
@@ -76,9 +76,9 @@ getUnwindRuleRefReg(const dwarf::UnwindTable::const_iterator &UnwindRow,
   }
 }
 
-DWARFCFIAnalysis::DWARFCFIAnalysis(MCContext *Context,
-                                       MCInstrInfo const &MCII, bool IsEH,
-                                       ArrayRef<MCCFIInstruction> Prologue)
+DWARFCFIAnalysis::DWARFCFIAnalysis(MCContext *Context, MCInstrInfo const &MCII,
+                                   bool IsEH,
+                                   ArrayRef<MCCFIInstruction> Prologue)
     : Context(Context), MCII(MCII), MCRI(Context->getRegisterInfo()),
       State(Context), IsEH(IsEH) {
 
@@ -121,7 +121,7 @@ DWARFCFIAnalysis::DWARFCFIAnalysis(MCContext *Context,
 }
 
 void DWARFCFIAnalysis::update(const MCInst &Inst,
-                                ArrayRef<MCCFIInstruction> Directives) {
+                              ArrayRef<MCCFIInstruction> Directives) {
   const MCInstrDesc &MCInstInfo = MCII.get(Inst.getOpcode());
 
   auto MaybePrevRow = State.getCurrentUnwindRow();

amsen20 · 2025-06-25T05:00:40Z

I see the following plan fit for splitting the implementation:

Edits to DebugInfo can be their own PR, they are minor as well.
FunctionUnitStreamer and FunctionUnitAnalyzer can be moved to llvm/MC/, like other MCStreamer implementers, and they can be tested with unit testing separately.
The remaining can stay in this PR, depending on the previous two independent PRs.

ilovepi

This is quite a large patch. It may make sense to split out parts that aren't required for complete functionality into their own patches.

For example, making addInstruction() public in DWARFCFIProgram.h, can probably be its own patch. I think there are several places of where you're just slightly refactoring the existing code.

As for the rest ... Its rather large, and reviewing incremental changes as you add functionality would make it significantly easier to provide high quality reviews.

ilovepi · 2025-06-25T18:48:54Z

llvm/include/llvm/UnwindInfoChecker/FunctionUnitAnalyzer.h

+#define LLVM_UNWINDINFOCHECKER_FUNCTIONUNITANALYZER_H
+
+#include "llvm/ADT/ArrayRef.h"
+#include <cstdio>


This doesn't appear to be needed.

ilovepi · 2025-06-25T18:49:41Z

llvm/include/llvm/UnwindInfoChecker/FunctionUnitAnalyzer.h

+
+} // namespace llvm
+
+#endif


Please check your editor settings, since github seems unhappy w/ the lack of newline here.

ilovepi · 2025-06-25T18:53:54Z

llvm/include/llvm/UnwindInfoChecker/FunctionUnitStreamer.h

+  std::vector<unsigned> FrameIndices;
+
+  unsigned LastDirectiveIndex;
+  std::optional<MCInst> LastInstruction;
+  std::unique_ptr<FunctionUnitAnalyzer> Analyzer;


We normally put private fields at the bottom of the class. Also please consider field ordering. using unsigned as the second field is probably going to end up w/ padding bytes.

ilovepi · 2025-06-25T18:57:35Z

llvm/include/llvm/UnwindInfoChecker/FunctionUnitUnwindInfoAnalyzer.h

+#include "FunctionUnitAnalyzer.h"
+#include "UnwindInfoAnalysis.h"
+#include "llvm/ADT/ArrayRef.h"
+#include <cstdio>


Suggested change

#include <cstdio>

I can't imagine why you'd need cstdio...

ilovepi · 2025-06-25T18:58:38Z

llvm/include/llvm/UnwindInfoChecker/UnwindInfoAnalysis.h

+#include "llvm/MC/MCStreamer.h"
+#include "llvm/MC/MCSubtargetInfo.h"
+#include "llvm/MC/TargetRegistry.h"
+#include <set>


https://llvm.org/docs/ProgrammersManual.html#set-like-containers-std-set-smallset-setvector-etc

Probably using an LLVM ADT is more appropriate.

ilovepi · 2025-06-25T19:23:45Z

llvm/lib/UnwindInfoChecker/FunctionUnitStreamer.cpp

+    return;
+  }
+
+  const auto *LastFrame = &getDwarfFrameInfos()[FrameIndices.back()];


Nit: generally prefer to use a real type over auto. using statements are a good option if you have some types that are hard to spell, plus that usually makes the code easier to read anyway. The style guide has more to say here: https://llvm.org/docs/CodingStandards.html#use-auto-type-deduction-to-make-code-more-readable

ilovepi · 2025-06-25T19:40:37Z

llvm/lib/UnwindInfoChecker/Registers.h

+    }
+  }
+
+  std::sort(SuperRegs.begin(), SuperRegs.end());


Suggested change

std::sort(SuperRegs.begin(), SuperRegs.end());

llvm::sort(SuperRegs.begin(), SuperRegs.end());

Default to llvm::sort over std::sort: https://llvm.org/docs/CodingStandards.html#beware-of-non-deterministic-sorting-order-of-equal-elements

ilovepi · 2025-06-25T19:42:53Z

llvm/lib/UnwindInfoChecker/UnwindInfoAnalysis.cpp

+  case dwarf::UnwindLocation::Location::Constant:
+  case dwarf::UnwindLocation::Location::Unspecified:
+  case dwarf::UnwindLocation::Location::DWARFExpr:
+    // TODO here should look into expr and find the registers.


Suggested change

// TODO here should look into expr and find the registers.

// TODO: here should look into expr and find the registers.

Most tooling works on TODO: or FIXME:. Please update the other instances.

statham-arm · 2025-06-26T09:03:17Z

llvm/include/llvm/UnwindInfoChecker/FunctionUnitAnalyzer.h

+//===----------------------------------------------------------------------===//
+///
+/// \file
+/// This file declares CFIFunctionFrameReceiver class.


It's difficult to read all of this new code without an overview of what all these classes are for. Could you expand the comments in your new header files to describe the purpose of each class, not just its name?

Here, for example, it looks as if CFIFunctionFrameReceiver does basically nothing, because it's just a base class. Some other class (which?) will be given a pointer to an instance of this base class, and will send it information about a function it's scanning, but really the pointer will point to a subclass which actually does something with the information.

But I had to look in the implementation source file to confirm that, by seeing that all the methods are empty.

statham-arm · 2025-06-26T09:04:55Z

llvm/tools/llvm-mc/llvm-mc.cpp

@@ -519,8 +526,16 @@ int main(int argc, char **argv) {
  std::unique_ptr<MCInstrInfo> MCII(TheTarget->createMCInstrInfo());
  assert(MCII && "Unable to create instruction info!");

+  std::unique_ptr<FunctionUnitUnwindInfoAnalyzer> FUUIA(


What's a FunctionUnitUnwindInfoAnalyzer? The class defined in FunctionUnitUnwindInfoAnalyzer.h is called something else. Surely this code won't even compile in its current state?

Separately from that, you can make this unique_ptr without having to repeat the enormously long class name twice by writing

auto FUUIA = std::make_unique<ClassName>(parameters);

statham-arm · 2025-06-26T09:18:17Z

llvm/include/llvm/UnwindInfoChecker/FunctionUnitAnalyzer.h

@@ -0,0 +1,49 @@
+//===----------------------------------------------------------------------===//


It looks as if you renamed all the classes defined in these header files, but left the file names themselves unchanged, so that now the file names don't match the class names and there's twice as much to remember?

statham-arm · 2025-06-26T09:25:26Z

llvm/lib/UnwindInfoChecker/UnwindInfoState.cpp

+}
+
+std::optional<dwarf::CFIProgram>
+DWARFCFIState::convert(MCCFIInstruction Directive) {


Why is this function returning a std::optional, when it doesn't ever seem to return nullopt? As far as I can see, you always make a CFIProgram at the start, and always return it at the end. You may or may not put an instruction into it, but you still return the CFIProgram even if it's empty.

statham-arm · 2025-06-26T09:33:32Z

llvm/lib/UnwindInfoChecker/UnwindInfoState.cpp

+    return;
+  }
+  Table.insert(Table.end(), NewRows.begin(), NewRows.end());
+  Table.push_back(Row);


I don't understand this function at all, I'm afraid. It looks as if you first make Row by copying the last entry already in Table (if any); then you parse it into some more rows that you append to Table; and then you push Row on to the end of Table again. Wasn't it in Table already? Isn't that where you got it from in the first place?

statham-arm · 2025-06-26T09:48:04Z

llvm/lib/UnwindInfoChecker/UnwindInfoAnalysis.cpp

+  if (!MaybeLLVMReg) {
+    assert(PrevLoc == NextLoc &&
+           "The dwarf register does not have a LLVM number, so the unwind info "
+           "for it should not change");


Why are all of these checks assertions, rather than Context->reportWarning? Is this information expected to be consistent in all of these ways because your code constructed it itself?

statham-arm · 2025-06-26T09:50:45Z

llvm/lib/UnwindInfoChecker/UnwindInfoAnalysis.cpp

+  switch (PrevLoc.getLocation()) {
+  case dwarf::UnwindLocation::Same:
+  case dwarf::UnwindLocation::RegPlusOffset:
+    if (Writes.count(MaybePrevRefReg.value())) {


How come you don't need to check whether MaybePrevRefReg has a value at all in this test?

statham-arm · 2025-06-26T09:56:48Z

llvm/lib/UnwindInfoChecker/UnwindInfoAnalysis.cpp

+    Context->reportWarning(
+        Inst.getLoc(),
+        formatv("unknown change happened to %{0} unwinding rule structure",
+                RegName));


I don't understand what's going on here, I'm afraid. From this code it looks as if, any time PrevLoc != NextLoc, we report one of the two warnings above, and it's only a question of which one. But that makes no sense – surely a register's location should change sometimes, e.g. when we push it on the stack and update the CFI to record that we did so. And one of your tests (spill-two-reg.s) looks as if it's got a push instruction that passes this test. But I don't see how it passes this test.

Can you explain how this code is supposed to tell the difference between a correct and an incorrect way to push a register and update its CFI?

statham-arm · 2025-06-26T10:21:12Z

llvm/lib/UnwindInfoChecker/UnwindInfoAnalysis.cpp

+    if (PrevLoc.getLocation() == NextLoc.getLocation()) {
+      Context->reportWarning(
+          Inst.getLoc(),
+          formatv("unknown change happened to %{0} unwinding rule values",


These % prefixes on register names in your error messages are very x86-specific. In Arm, or many other architectures, register names aren't prefixed with %. Even in x86 they aren't, if you use the Intel rather than AT&T syntax. I think it would be better to leave the % off completely.

statham-arm · 2025-06-26T10:25:39Z

llvm/test/UnwindInfoChecker/X86/spill-two-reg.s

@@ -0,0 +1,43 @@
+# RUN: llvm-mc %s --validate-cfi --filetype=null


It looks as if this test is supposed to run without producing any warnings. Would it be a good idea to run FileCheck anyway, with a CHECK-NOT: warning: to make sure a warning doesn't show up by accident?

Amirhossein Pashaeehir added 30 commits June 25, 2025 03:55

Forward CFI directives alongside their associated instruction to a si…

5b78ea0

…mple CFI analysis

Create a new CFI analysis for each dwarf separately

b781e83

Separate the logic and display

8b43f8c

Compile llvm mc depending on bolt

6790f90

Extract semantic info on MCInst from bolt

75f3b2d

Typo in evaluateStackOffsetExpr doc

a3d08d8

Add a DS to keep track of each register CFI state

7a10923

Check if the CFA change is correct or not

5841b8f

Add the CFI validation as a option to llvm-mc tool

69a8e9d

Provide the directives as an arrayref instead of index range

43b54fd

Remove TODOs

474ff1c

Wrap all the MCPB info needed into a ExtendedMCInstrAnalysis class

8ea5d9c

Use extended MCAnalysis instead of MCplus (directly)

e92005c

Make the CFA checker to first generate, if not matched check the Read…

7527ba2

…/Write conditions

Check register states as well

3d65eeb

Process prologue as well

79c1664

Ignore flag registers as well

02410ac

Enable dump

dfdd22f

Enable getting flag register through extended MCInstrAnalysis

4fd37bf

Move the isolated MC semantic methods to ExtendedMCAnalysis

07d38b2

Remove unused includes

eb06aa4

Test cfi validator on following cases:

e7c3140

- Changing the register and not informing it using cfi directives - Changing the CFA and report it wrong - Load a register from another register stored location

Prune the CFA diff checker to only check read/write

6347ef4

Move the CFI analysis headers to MC

453917b

Move all the CFI analysis implementations to lib

9f8bca8

Remove stack pointer method

2c233f9

Comment out Ignoring flags at the beginning CFI analysis

12faad8

Move the necessary MCStreamer interface methods to CFIAnalysisMCStrea…

282cb77

…mer.h

Make the update method const

903e3fb

Move CFI analysis out of MC layer

7bb20f8

Amirhossein Pashaeehir added 16 commits June 25, 2025 03:56

Fix the CFIProgram usage temporary

0a8f747

Minimize check reg diff arguments

48472ac

Rename the library to UnwindInfoChecker

7576857

Fix the header guards

374f144

Add file headers

e42907f

Resolve warnings

37858ef

Refactor UnwindInfoAnalysis

b8cdef6

Move tests to UnwindInfoChecker folder

c1538d3

Rename UnwindHistory to UnwindState

88952b0

Make different methods to add instruction public

017d890

Revert bolt change

76b96dd

Revert bolt changes

60689b3

Change UnwindTable to RowContainer

5b3475b

Refactor UnwindInfoState

6bcc0cc

Revert unnecessary changes to llvm-mc

439c1d0

Lint

dfddfea

Pass the tests

876f5a6

petrhosek requested review from statham-arm, frobtech and ilovepi June 25, 2025 16:50

ilovepi reviewed Jun 25, 2025

View reviewed changes

Amirhossein Pashaeehir added 2 commits June 25, 2025 21:24

Fix weird end file formatting issues

9546eac

Apply suggestions

f419eb6

amsen20 mentioned this pull request Jun 25, 2025

[NFC] Make unwind table internal data structures more accessible #145804

Open

Make class names more specific

ee98d82

statham-arm reviewed Jun 26, 2025

View reviewed changes

	std::sort(SuperRegs.begin(), SuperRegs.end());
	llvm::sort(SuperRegs.begin(), SuperRegs.end());

	// TODO here should look into expr and find the registers.
	// TODO: here should look into expr and find the registers.

		@@ -0,0 +1,49 @@
		//===----------------------------------------------------------------------===//

		@@ -0,0 +1,43 @@
		# RUN: llvm-mc %s --validate-cfi --filetype=null

Minimal unwinding information (DWARF CFI) checker #145633

Are you sure you want to change the base?

Minimal unwinding information (DWARF CFI) checker #145633

Conversation

amsen20 commented Jun 25, 2025

Uh oh!

amsen20 commented Jun 25, 2025

Uh oh!

github-actions bot commented Jun 25, 2025 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Uh oh!

amsen20 commented Jun 25, 2025 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Uh oh!

ilovepi left a comment

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Uh oh!

github-actions bot commented Jun 25, 2025 •

edited

Loading

amsen20 commented Jun 25, 2025 •

edited

Loading