Skip to content

[BoundsSafety] Refactoring Plan: Late Parsing for Bounds Safety Type Attributes #12758

@rapidsna

Description

@rapidsna

Background

The current downstream implementation of late parsing for bounds-safety attributes (counted_by, sized_by, ended_by, etc.) first creates a version of the type without late-parsed attributes, tracking late parsing information in a separate data structure along with the nested type index. When late parsing is triggered (e.g., after the whole structure body is parsed), it walks through the declaration type's nested type nodes to the level indicated by the index and inserts the wrapper type (e.g., CountAttributedType).

The upstream PR llvm#179612 ("[BoundsSafety] Support bounds-safety attributes in type positions") introduces a new approach for handling these attributes in type positions.

Goal

Adopt the upstream approach downstream. The upstream PR only supports counted_by on struct fields. Downstream, we additionally support these attributes on function parameters and more late-parsed attributes like ended_by. Adopting the upstream approach means adjusting all of this downstream code to use the same mechanism consistently.

I have locally verified that this approach can be successfully used to attach the counted_by attribute to function parameters, confirming the result in AST dump.

Approach

Instead of tracking a type position index for late-parsed attributes, create a placeholder type (LateParsedAttrType) during initial type construction. When late parsing occurs, TreeTransform rebuilds the declaration, replacing the placeholder with the concrete type (e.g., CountAttributedType). This avoids index-tracking issues with complex types (e.g., templated C++ types) and was agreed upon with @AaronBallman and other Clang contributors.

Key elements of the upstream PR:

  • Introduces LateParsedAttrType placeholder type to defer attribute processing until the complete struct/function definition is available
  • Moves LateParsedDeclaration and LateParsedAttribute out of the Parser class
  • Moves LateParsedAttrList out of Parser.h into DeclSpec.h
  • Introduces LateParsedTypeAttribute (child of LateParsedAttribute) for type attribute handling
  • Adds LateParsedAttribute vectors to DeclaratorChunk, Declarator, and DeclSpec
  • Resolves placeholders via TreeTransform to concrete types (e.g., CountAttributedType)

Execution Plan

Properly cherry-picking and transitioning to this approach requires a large change downstream, so it should be done incrementally. The tasks below are refactoring steps that can be landed independently before actually switching to the new approach.

Each task will have its own GitHub issue so they can be worked on in parallel. I am using Claude Code agents to orchestrate tasks and help prepare preliminary patches for individual sub-tasks where possible, so contributors can easily pick them up and polish.

Once the refactoring is complete, features not yet available upstream (such as function parameter support and ended_by) can be incrementally upstreamed.

Terminology

  • LateParsedAttrType — AST type node; placeholder in the type system
  • LateParsedTypeAttribute — Data structure subtyping LateParsedAttribute (not an AST node)
  • LateParsedAttrInfo — Current DeclaratorChunk workaround (to be removed)

Tasks

Validation: All tasks must pass existing lit tests (llvm-lit --filter='counted-by|bounds-safety|sized-by' clang/test and llvm-lit clang/test/BoundsSafety). Each task should be self-contained and land independently.

Where possible, tasks are structured to cherry-pick parts of the upstream PR llvm#179612 independently, marked with (cherry-pick). I will split the upstream PR into smaller PRs/patches so each can be cherry-picked to the corresponding task. Tasks without this marker are downstream-only refactoring needed to prepare for the new approach.

Phase 0 — Independent Foundations (parallel)

  • T1: Move late parsing for parameters trigger from ParseDeclaratorInternal to ActOnFunctionDeclarator[BoundsSafety] Move late parsing for parameters trigger from ParseDeclaratorInternal to ActOnFunctionDeclarator #12766
    Currently, late parsing for parameters is triggered early during declarator parsing to prevent FunctionDecl from being immediately merged with a previous declaration when the bounds attributes differ. This should change because the new approach first creates LateParsedAttrType placeholders during declarator parsing and replaces them after the declaration is fully formed. Moving the trigger to after the full FunctionDecl is created validates that late parsing for parameters works from the new site. Function redeclaration must be verified.

  • T4: Extract diagnostics from ConstructDynamicBoundType / ConstructCountAttributedType / ConstructDynamicRangePointerType[BoundsSafety] Extract diagnostics from ConstructDynamicBoundType / ConstructCountAttributedType / ConstructDynamicRangePointerType #12765
    These CRTP TypeVisitor classes in SemaDeclAttr.cpp walk the type to a nested level index and construct the bounds-attributed type there, mixing type construction with diagnostics. Extract diagnostics into flat functions that run before construction; the conditions that currently trigger diagnostics become invariants enforced by assert(). Why: The new approach constructs types via TreeTransform replacement rather than these index-based visitors. However, the original construction is still needed for APINotes and template instantiation. Extracting diagnostics makes them callable independently from both the old and new code paths. Note: This migration is not strictly necessary for T10. Another possible path is to create a separate diagnostic function for T10 and then refactor afterwards, in case it's going to block T10.

  • T7: Move LateParsedAttribute outside Parser class; move LateParsedAttrList to DeclSpec.h (cherry-pick)[BoundsSafety] Move LateParsedAttribute outside Parser class; move LateParsedAttrList to DeclSpec.h #12764
    Cherry-pick the structural moves from the upstream PR [BoundsSafety] Support bounds-safety attributes in type positions llvm/llvm-project#179612:

    1. Move LateParsedDeclaration and LateParsedAttribute outside the Parser class (stay in Parser.h)
    2. Move LateParsedAttrList from Parser.h to DeclSpec.h
    3. Forward-declare LateParsedAttribute in DeclSpec.h (opaque)
    4. Replace LateParsedAttrInfo usage in DeclaratorChunk with LateParsedAttribute*
    5. Remove LateParsedAttrInfo
  • T6: Handle non-late-parsed counted_by/ended_by as type attributes in SemaType.cpp[BoundsSafety] Handle non-late-parsed counted_by/ended_by as type attributes in SemaType.cpp #12767
    Currently both late-parsed and non-late-parsed paths go through SemaDeclAttr.cpp (declaration attribute handling). Move the non-late-parsed path to SemaType.cpp (type attribute handling), while the late-parsed path remains in SemaDeclAttr.cpp until T10 lands. Needs investigation on whether this is feasible as a standalone change. If not, absorbed into T10. Why: The new approach treats these as type attributes, not declaration attributes. Migrating the non-late-parsed path first reduces the scope of T10.

Phase 1 — Data Structures + Diagnostic Reconciliation (parallel chains)

  • T8: Introduce LateParsedTypeAttribute (upstream/cherry-pick) — Upstreamed as [BoundsSafety][NFC] Introduce LateParsedTypeAttribute for late-parsed type attributes llvm/llvm-project#192799. Subtypes LateParsedAttribute for late-parsed type attributes. Independent of T7. Why: The new approach needs a distinct data structure to carry type-attribute-specific information (e.g., the pointer nesting level where the placeholder was inserted) through the late parsing pipeline.
  • T9: Introduce LateParsedAttrType[BoundsSafety][NFC] Introduce LateParsedAttrType AST placeholder type #13000 (upstream/cherry-pick) — This is a place holder type in AST to be replaced with a concrete type, e.g., CountAttributedType during late parsing. Depends on T8 because LateParsedAttrType embeds LateParsedTypeAttribute.
  • T5: Merge redundant diagnostics — Merge handlePtrCountedByEndedByAttr (BoundsSafety-enabled pass in SemaDeclAttr.cpp) and handleCountedByAttrField (default pass), which have overlapping diagnostic logic, including the functions they call. Also reconcile with CountArgChecker/RangeArgChecker in SemaType.cpp. Depends on T4.

Phase 2 — Diagnostic API Design

  • T2: Split diagnostics into DeclContext vs Type

    • DeclContext diagnostics (needs Decl) — run right after late parsing (whole struct / function prototype)
    • Type diagnostics (nested type related) — called from two places:
      • When attaching LateParsedAttrType (or before) — invariants for LateParsedAttrType checked here
      • When replacing LateParsedAttrType with concrete CountAttributedType and friends — invariants for CountAttributedType checked here; argument expression diagnostics here too

    Why: The new approach has two distinct points where diagnostics run: (1) when the placeholder is inserted during type construction, and (2) when the placeholder is replaced as part of late parsing for a declaration. Each site needs a different subset of diagnostics. This split defines the API that T10 will use. Depends on T5.

  • T11: Refactor merged handler API for multiple callers — Refactor the merged handler so the core logic can be called from both APINotes (where Level is provided explicitly as a parameter) and the new late parsing logic once it's introduced (where the placeholder type already knows its position). Depends on T2.

Phase 3 — Convergence

  • T10: New late parsing mechanism with LateParsedAttrType (cherry-pick + extend)
    Cherry-pick the remaining core of the upstream PR [BoundsSafety] Support bounds-safety attributes in type positions llvm/llvm-project#179612: LateParsedAttrType placeholder creation during processLateTypeAttrs, ProcessLateParsedTypeAttributesForFields to replace placeholders via RebuildTypeWithLateParsedAttr, and LateParsedAttribute vectors in DeclaratorChunk/Declarator/DeclSpec. On top of that, introduce ProcessLateParsedTypeAttributesForParameters for late parsing of attributes on function parameter types, and handle ended_by in a similar manner. This can be split into smaller sub-tasks. Absorbs T6 if not standalone. Replaces T1's transitional function. Depends on T1, T4, T9, T2, and T11.

Metadata

Metadata

Assignees

Labels

clang:bounds-safetyIssue relating to the experimental -fbounds-safety feature in Clang

Type

No type

Projects

No projects

Milestone

No milestone

Relationships

None yet

Development

No branches or pull requests

Issue actions