Skip to content

Minimal unwinding information (DWARF CFI) checker #145633

New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Draft
wants to merge 62 commits into
base: main
Choose a base branch
from

Conversation

amsen20
Copy link
Contributor

@amsen20 amsen20 commented Jun 25, 2025

This PR adds a minimal version of UnwindInfoChecker described in here.

This implementation looks into the modified registers by each instruction and checks:

  • If one of them is the CFA register, and it informs the user if the CFA data is not modified as well.
  • If one of them is used in another register's unwinding rule, it informs the user if the unwind info is not modified after this instruction.

This implementation does not support DWARF expressions and treats them as unknown unwinding rules.

Amirhossein Pashaeehir added 30 commits June 25, 2025 03:55
- Changing the register and not informing it using cfi directives
- Changing the CFA and report it wrong
- Load a register from another register stored location
Amirhossein Pashaeehir added 16 commits June 25, 2025 03:56
…d `FunctionUnitAnalyzer`.

Define `FunctionUnitUnwindInfoAnalyzer` that executes the Analysis during the stream.

The main reasons for doing this is:
- This is compatible with other use cases of `MCStreamer` implementers (e.g. `MCAsmStreamer` and `MCAssembler`).
- This approach result in on the fly analysis, if it was implemented like `RecordStreamer`, then all the instruction of the whole program have to be copied and be stored in the memory.
@amsen20
Copy link
Contributor Author

amsen20 commented Jun 25, 2025

Please help me in:

  • Breaking this implementation in re-viewable parts
  • Testing each part of the implementation properly

It's my first non-mechanical PR, so any suggestion, even little is appreciated.

Copy link

github-actions bot commented Jun 25, 2025

⚠️ C/C++ code formatter, clang-format found issues in your code. ⚠️

You can test this locally with the following command:
git-clang-format --diff HEAD~1 HEAD --extensions cpp,h -- llvm/include/llvm/UnwindInfoChecker/FunctionUnitAnalyzer.h llvm/include/llvm/UnwindInfoChecker/FunctionUnitStreamer.h llvm/include/llvm/UnwindInfoChecker/FunctionUnitUnwindInfoAnalyzer.h llvm/include/llvm/UnwindInfoChecker/UnwindInfoAnalysis.h llvm/include/llvm/UnwindInfoChecker/UnwindInfoState.h llvm/lib/UnwindInfoChecker/FunctionUnitAnalyzer.cpp llvm/lib/UnwindInfoChecker/FunctionUnitStreamer.cpp llvm/lib/UnwindInfoChecker/FunctionUnitUnwindInfoAnalyzer.cpp llvm/lib/UnwindInfoChecker/Registers.h llvm/lib/UnwindInfoChecker/UnwindInfoAnalysis.cpp llvm/lib/UnwindInfoChecker/UnwindInfoState.cpp llvm/include/llvm/DebugInfo/DWARF/DWARFCFIProgram.h llvm/include/llvm/DebugInfo/DWARF/DWARFDebugFrame.h llvm/tools/llvm-mc/llvm-mc.cpp
View the diff from clang-format here.
diff --git a/llvm/lib/UnwindInfoChecker/FunctionUnitStreamer.cpp b/llvm/lib/UnwindInfoChecker/FunctionUnitStreamer.cpp
index 54389aa7f..775ca35eb 100644
--- a/llvm/lib/UnwindInfoChecker/FunctionUnitStreamer.cpp
+++ b/llvm/lib/UnwindInfoChecker/FunctionUnitStreamer.cpp
@@ -17,7 +17,8 @@
 
 using namespace llvm;
 
-std::pair<unsigned, unsigned> CFIFunctionFrameStreamer::updateDirectivesRange() {
+std::pair<unsigned, unsigned>
+CFIFunctionFrameStreamer::updateDirectivesRange() {
   auto Frames = getDwarfFrameInfos();
   unsigned CurrentCFIDirectiveIndex = 0;
   if (hasUnfinishedDwarfFrameInfo()) {
@@ -42,7 +43,8 @@ void CFIFunctionFrameStreamer::updateAnalyzer() {
     return;
   }
 
-  const MCDwarfFrameInfo *LastFrame = &getDwarfFrameInfos()[FrameIndices.back()];
+  const MCDwarfFrameInfo *LastFrame =
+      &getDwarfFrameInfos()[FrameIndices.back()];
 
   auto DirectivesRange = updateDirectivesRange();
   ArrayRef<MCCFIInstruction> Directives;
@@ -62,7 +64,7 @@ void CFIFunctionFrameStreamer::updateAnalyzer() {
 }
 
 void CFIFunctionFrameStreamer::emitInstruction(const MCInst &Inst,
-                                           const MCSubtargetInfo &STI) {
+                                               const MCSubtargetInfo &STI) {
   updateAnalyzer();
   LastInstruction = Inst;
 }
diff --git a/llvm/lib/UnwindInfoChecker/UnwindInfoAnalysis.cpp b/llvm/lib/UnwindInfoChecker/UnwindInfoAnalysis.cpp
index 872793015..ab615fbe6 100644
--- a/llvm/lib/UnwindInfoChecker/UnwindInfoAnalysis.cpp
+++ b/llvm/lib/UnwindInfoChecker/UnwindInfoAnalysis.cpp
@@ -76,9 +76,9 @@ getUnwindRuleRefReg(const dwarf::UnwindTable::const_iterator &UnwindRow,
   }
 }
 
-DWARFCFIAnalysis::DWARFCFIAnalysis(MCContext *Context,
-                                       MCInstrInfo const &MCII, bool IsEH,
-                                       ArrayRef<MCCFIInstruction> Prologue)
+DWARFCFIAnalysis::DWARFCFIAnalysis(MCContext *Context, MCInstrInfo const &MCII,
+                                   bool IsEH,
+                                   ArrayRef<MCCFIInstruction> Prologue)
     : Context(Context), MCII(MCII), MCRI(Context->getRegisterInfo()),
       State(Context), IsEH(IsEH) {
 
@@ -121,7 +121,7 @@ DWARFCFIAnalysis::DWARFCFIAnalysis(MCContext *Context,
 }
 
 void DWARFCFIAnalysis::update(const MCInst &Inst,
-                                ArrayRef<MCCFIInstruction> Directives) {
+                              ArrayRef<MCCFIInstruction> Directives) {
   const MCInstrDesc &MCInstInfo = MCII.get(Inst.getOpcode());
 
   auto MaybePrevRow = State.getCurrentUnwindRow();

@amsen20
Copy link
Contributor Author

amsen20 commented Jun 25, 2025

I see the following plan fit for splitting the implementation:

  • Edits to DebugInfo can be their own PR, they are minor as well.
  • FunctionUnitStreamer and FunctionUnitAnalyzer can be moved to llvm/MC/, like other MCStreamer implementers, and they can be tested with unit testing separately.
  • The remaining can stay in this PR, depending on the previous two independent PRs.

Copy link
Contributor

@ilovepi ilovepi left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This is quite a large patch. It may make sense to split out parts that aren't required for complete functionality into their own patches.

For example, making addInstruction() public in DWARFCFIProgram.h, can probably be its own patch. I think there are several places of where you're just slightly refactoring the existing code.

As for the rest ... Its rather large, and reviewing incremental changes as you add functionality would make it significantly easier to provide high quality reviews.

#define LLVM_UNWINDINFOCHECKER_FUNCTIONUNITANALYZER_H

#include "llvm/ADT/ArrayRef.h"
#include <cstdio>
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This doesn't appear to be needed.


} // namespace llvm

#endif
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Please check your editor settings, since github seems unhappy w/ the lack of newline here.

Comment on lines 29 to 33
std::vector<unsigned> FrameIndices;

unsigned LastDirectiveIndex;
std::optional<MCInst> LastInstruction;
std::unique_ptr<FunctionUnitAnalyzer> Analyzer;
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

We normally put private fields at the bottom of the class. Also please consider field ordering. using unsigned as the second field is probably going to end up w/ padding bytes.

#include "FunctionUnitAnalyzer.h"
#include "UnwindInfoAnalysis.h"
#include "llvm/ADT/ArrayRef.h"
#include <cstdio>
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Suggested change
#include <cstdio>

I can't imagine why you'd need cstdio...

#include "llvm/MC/MCStreamer.h"
#include "llvm/MC/MCSubtargetInfo.h"
#include "llvm/MC/TargetRegistry.h"
#include <set>
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

return;
}

const auto *LastFrame = &getDwarfFrameInfos()[FrameIndices.back()];
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Nit: generally prefer to use a real type over auto. using statements are a good option if you have some types that are hard to spell, plus that usually makes the code easier to read anyway. The style guide has more to say here: https://llvm.org/docs/CodingStandards.html#use-auto-type-deduction-to-make-code-more-readable

}
}

std::sort(SuperRegs.begin(), SuperRegs.end());
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Suggested change
std::sort(SuperRegs.begin(), SuperRegs.end());
llvm::sort(SuperRegs.begin(), SuperRegs.end());

Default to llvm::sort over std::sort: https://llvm.org/docs/CodingStandards.html#beware-of-non-deterministic-sorting-order-of-equal-elements

case dwarf::UnwindLocation::Location::Constant:
case dwarf::UnwindLocation::Location::Unspecified:
case dwarf::UnwindLocation::Location::DWARFExpr:
// TODO here should look into expr and find the registers.
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Suggested change
// TODO here should look into expr and find the registers.
// TODO: here should look into expr and find the registers.

Most tooling works on TODO: or FIXME:. Please update the other instances.

//===----------------------------------------------------------------------===//
///
/// \file
/// This file declares CFIFunctionFrameReceiver class.
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

It's difficult to read all of this new code without an overview of what all these classes are for. Could you expand the comments in your new header files to describe the purpose of each class, not just its name?

Here, for example, it looks as if CFIFunctionFrameReceiver does basically nothing, because it's just a base class. Some other class (which?) will be given a pointer to an instance of this base class, and will send it information about a function it's scanning, but really the pointer will point to a subclass which actually does something with the information.

But I had to look in the implementation source file to confirm that, by seeing that all the methods are empty.

@@ -519,8 +526,16 @@ int main(int argc, char **argv) {
std::unique_ptr<MCInstrInfo> MCII(TheTarget->createMCInstrInfo());
assert(MCII && "Unable to create instruction info!");

std::unique_ptr<FunctionUnitUnwindInfoAnalyzer> FUUIA(
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

What's a FunctionUnitUnwindInfoAnalyzer? The class defined in FunctionUnitUnwindInfoAnalyzer.h is called something else. Surely this code won't even compile in its current state?

Separately from that, you can make this unique_ptr without having to repeat the enormously long class name twice by writing

  auto FUUIA = std::make_unique<ClassName>(parameters);

@@ -0,0 +1,49 @@
//===----------------------------------------------------------------------===//
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

It looks as if you renamed all the classes defined in these header files, but left the file names themselves unchanged, so that now the file names don't match the class names and there's twice as much to remember?

}

std::optional<dwarf::CFIProgram>
DWARFCFIState::convert(MCCFIInstruction Directive) {
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Why is this function returning a std::optional, when it doesn't ever seem to return nullopt? As far as I can see, you always make a CFIProgram at the start, and always return it at the end. You may or may not put an instruction into it, but you still return the CFIProgram even if it's empty.

return;
}
Table.insert(Table.end(), NewRows.begin(), NewRows.end());
Table.push_back(Row);
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I don't understand this function at all, I'm afraid. It looks as if you first make Row by copying the last entry already in Table (if any); then you parse it into some more rows that you append to Table; and then you push Row on to the end of Table again. Wasn't it in Table already? Isn't that where you got it from in the first place?

if (!MaybeLLVMReg) {
assert(PrevLoc == NextLoc &&
"The dwarf register does not have a LLVM number, so the unwind info "
"for it should not change");
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Why are all of these checks assertions, rather than Context->reportWarning? Is this information expected to be consistent in all of these ways because your code constructed it itself?

switch (PrevLoc.getLocation()) {
case dwarf::UnwindLocation::Same:
case dwarf::UnwindLocation::RegPlusOffset:
if (Writes.count(MaybePrevRefReg.value())) {
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

How come you don't need to check whether MaybePrevRefReg has a value at all in this test?

Context->reportWarning(
Inst.getLoc(),
formatv("unknown change happened to %{0} unwinding rule structure",
RegName));
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I don't understand what's going on here, I'm afraid. From this code it looks as if, any time PrevLoc != NextLoc, we report one of the two warnings above, and it's only a question of which one. But that makes no sense – surely a register's location should change sometimes, e.g. when we push it on the stack and update the CFI to record that we did so. And one of your tests (spill-two-reg.s) looks as if it's got a push instruction that passes this test. But I don't see how it passes this test.

Can you explain how this code is supposed to tell the difference between a correct and an incorrect way to push a register and update its CFI?

if (PrevLoc.getLocation() == NextLoc.getLocation()) {
Context->reportWarning(
Inst.getLoc(),
formatv("unknown change happened to %{0} unwinding rule values",
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

These % prefixes on register names in your error messages are very x86-specific. In Arm, or many other architectures, register names aren't prefixed with %. Even in x86 they aren't, if you use the Intel rather than AT&T syntax. I think it would be better to leave the % off completely.

@@ -0,0 +1,43 @@
# RUN: llvm-mc %s --validate-cfi --filetype=null
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

It looks as if this test is supposed to run without producing any warnings. Would it be a good idea to run FileCheck anyway, with a CHECK-NOT: warning: to make sure a warning doesn't show up by accident?

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

3 participants