Skip to content

Implement smart linking, take 3 #1382

New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Draft
wants to merge 1 commit into
base: master
Choose a base branch
from
Draft

Conversation

Rangi42
Copy link
Contributor

@Rangi42 Rangi42 commented Mar 27, 2024

Fixes #82.

Replaces #654 and #782. Many thanks to @ISSOtm and @daid for getting smart linking to this point!

  • Allow referencing sections from the linkerscript? Or even from the assembly too?
  • Add a mechanism so that RGBASM-elided references are still emitted (lacking this is why test/link/smart/chain-wram.asm fails)
  • Add more tests covering more edge cases
    • Multiple -s referenced sections
    • Any still-uncovered code (use contrib/coverage.bash)
  • Sections should not be purged if fully constrained
  • Apply smart linking after linker script (avoids errors & interacts with above)

@Rangi42 Rangi42 added enhancement Typically new features; lesser priority than bugs rgblink This affects RGBLINK labels Mar 27, 2024
@Rangi42 Rangi42 added this to the v0.8.0 milestone Mar 27, 2024
@Rangi42 Rangi42 requested a review from ISSOtm March 27, 2024 19:30
@Rangi42 Rangi42 changed the title Implement smart linking Implement smart linking, take 3 Mar 27, 2024
This was referenced Mar 27, 2024
@Rangi42 Rangi42 force-pushed the smart-linking branch 3 times, most recently from 4c242e1 to 7f81f1e Compare March 27, 2024 20:32
@Rangi42
Copy link
Contributor Author

Rangi42 commented Mar 27, 2024

One of the test cases from #782 currently fails. Note that #782 failed it as well, so that's a TODO.

smart/chain-wram.smart.bin /tmp/tmp.ETsdvFO98G differ: char 3, line 1
00:0000 
< 00:0000: 00c0 0102 0000 0000 0000 0000 0000 0000  ................
---
00:0000 
> 00:0000: 00c0 0000 0000 0000 0000 0000 0000 0000  ................
smart/chain-wram.smart.out.bin mismatch!

This is the asm:

SECTION "root", ROM0[0]
    dw WRAMLabel

; This section should be kept thanks to the reference from the WRAM section
SECTION "A", ROM0
Label:
    db $01, $02
.end:

SECTION "wram", WRAM0
WRAMLabel:
    ds Label.end - Label

SECTION "UnRef", ROM0
UnRef:
    db UnRef

@Rangi42 Rangi42 added the WIP This PR is a work in progress label Mar 27, 2024
@Rangi42 Rangi42 force-pushed the smart-linking branch 3 times, most recently from f67797a to 64cedb2 Compare March 28, 2024 00:48
@daid
Copy link
Contributor

daid commented Mar 28, 2024

One of the test cases from #782 currently fails. Note that #782 failed it as well, so that's a TODO.

smart/chain-wram.smart.bin /tmp/tmp.ETsdvFO98G differ: char 3, line 1
00:0000 
< 00:0000: 00c0 0102 0000 0000 0000 0000 0000 0000  ................
---
00:0000 
> 00:0000: 00c0 0000 0000 0000 0000 0000 0000 0000  ................
smart/chain-wram.smart.out.bin mismatch!

This is the asm:

SECTION "root", ROM0[0]
    dw WRAMLabel

; This section should be kept thanks to the reference from the WRAM section
SECTION "A", ROM0
Label:
    db $01, $02
.end:

SECTION "wram", WRAM0
WRAMLabel:
    ds Label.end - Label

SECTION "UnRef", ROM0
UnRef:
    db UnRef

Not sure why that would be a valid test. There is no way to know where section "A" is located, you only know how big it is.

@ISSOtm
Copy link
Member

ISSOtm commented Mar 28, 2024

RGBASM is able to compute the difference between these two labels itself, and so it does: the ds line is processed as if it was ds 2, and RGBASM does not emit a reference to Label nor Label.end.

Yet, since section "wram" is kept, and it contains a source-code reference to Label, "A" should be kept by "smart linking". Shouldn't it?

@daid
Copy link
Contributor

daid commented Mar 28, 2024

At a detailed source level, yes, it contains a reference. But at a practical level higher overview, it does not contain a reference. It only contains a reference to the size (which is constant, and thus can be pre-computed before linking)

IMHO, it's one of those edge cases that does not require to work as there is no practical usage of this in any way. While fixing it is annoyingly needlessly complicated.

@aaaaaa123456789
Copy link
Member

Before this is merged, I should probably say that specifying the starting sections in the command line isn't the best idea.
I'd expect this to be given as a section attribute in the source code itself: SECTION "Foobar", ROM0, NODISCARD (or whatever you want to call the "treat this section as always referenced" attribute).

Of course, this doesn't mean that the command line flag cannot be used, but I'd expect to see a way to specify this in code — which would also function as an escape hatch for the above "should this section be included?" problem.

@Rangi42
Copy link
Contributor Author

Rangi42 commented Mar 28, 2024

IIRC we wanted the linker script to also support designating root sections.

@Rangi42 Rangi42 force-pushed the smart-linking branch 5 times, most recently from 8fd60eb to 656b87d Compare March 28, 2024 21:27
@Rangi42
Copy link
Contributor Author

Rangi42 commented Mar 28, 2024

Proposal: add an RPN_REFCONST expression type, which would be like a constant expression, but also tracking lists of the symbol and section names which went into computing its value. When an asm-time constant is emitted that depends on a name and/or on other REFCONST expressions, build up the list. Then RGBLINK's patch_FindReferencedSections would take those names into account for finding references.

(This would be a new expression type so the existing RPN_CONST would remain unchanged, and would not be bloated by 0 section and symbol counts for the majority of cases where there aren't any.)

@ISSOtm @daid What do you think?

Edit: Oh, y'all discussed the exemplifying test case already. I agree with ISSOtm that using ld a, BANK(Label) or dw Label - OtherLabel or so on is intuitively as much of a reference as ld hl, Label is; the asm-time constant evaluation is just an optimization by us.

@Rangi42 Rangi42 force-pushed the smart-linking branch 4 times, most recently from fe2d958 to e9af039 Compare March 29, 2024 18:47
@Rangi42 Rangi42 force-pushed the smart-linking branch 2 times, most recently from 62da49f to e958055 Compare June 12, 2024 17:27
@Rangi42 Rangi42 modified the milestones: v0.8.0, v0.9.0 Jun 13, 2024
@Rangi42 Rangi42 force-pushed the smart-linking branch 3 times, most recently from aa7a745 to 6d6da0d Compare June 19, 2024 00:36
@Rangi42 Rangi42 removed the request for review from ISSOtm June 28, 2024 21:09
@Rangi42 Rangi42 removed this from the v0.9.0 milestone Aug 6, 2024
@Rangi42 Rangi42 force-pushed the smart-linking branch 2 times, most recently from ca7ff98 to b0771cb Compare August 8, 2024 17:56
@Rangi42 Rangi42 force-pushed the smart-linking branch 2 times, most recently from e645325 to 785f338 Compare August 27, 2024 18:23
@Rangi42
Copy link
Contributor Author

Rangi42 commented May 7, 2025

Instead of a RPN_REFCONST opcode, maybe we could just output unoptimized RPN opcodes for expressions that RGBASM currently optimizes due to being known-constant at assembly time. For example, Label2 - Label1, or BANK(Label) when RGBASM knows Label's bank, or Label & $ff when RGBASM knows Label is 8-bit aligned, or so on. RGBASM would then need to add a way to know that, even though those expressions store whole RPN trees, they still have a known constant value.

The downside there would be the same as adding RPN_REFCONST: object files would grow larger, and often the growth would be unnecessary (if the user is not going to use smart linking, or if the RPN-referenced labels would already have been referenced in other places). Maybe we could add an opt-out flag for RGBASM, so the user could assert "I'm not going to use smart linking in RGBLINK, output minimal object files"?

@ISSOtm What do you think?

@Rangi42
Copy link
Contributor Author

Rangi42 commented May 7, 2025

Also, this only implements the rgblink -s "section name" flag for "referencing" sections.

Do we want linkerscripts to be able to reference sections, e.g. as "section name" REF? If so, would smart linking occur due to the presence of a REF section even without a -s flag?

We could go further -- should .asm files be able to reference sections, e.g. as SECTION "name", ROM0, REF? (We would probably want a keyword less likely to conflict with existing label/symbol names...) That would require an update to the object file format, probably using one of the unused bits in the section type. And again, if so, would one force smart linking to occur?

(Note that although we've been colloquially calling these manually-referenced sections "root" sections in issue+PR+Discord discussion, the man page and source code don't actually use that term. So we wouldn't want to add e.g. a ROOT keyword unless we are going to make that an official term for them.)

@Rangi42 Rangi42 added this to the 0.9.4 milestone May 7, 2025
@ISSOtm
Copy link
Member

ISSOtm commented May 7, 2025

The linker script can already reference existing sections, possibly as FLOATING.

I'm not sure what the use case is for referencing sections without an actual reference (be it via a symbol, or via the section's name directly e.g. STARTOF("section name")). So it'd be simpler not to do anything about it until a use case is found.

@Rangi42
Copy link
Contributor Author

Rangi42 commented May 7, 2025

The linker script can already reference existing sections, possibly as FLOATING.

What do you mean? If I have a SECTION "Foo", ROM0 and then the linkerscript has FLOATING "Foo", that does not count as a reference to "Foo". Only passing rgblink -s "Foo" counts (or the recursive process of finding what's referenced by the -s section(s)).

I'm not sure what the use case is for referencing sections without an actual reference (be it via a symbol, or via the section's name directly e.g. STARTOF("section name")). So it'd be simpler not to do anything about it until a use case is found.

The rgblink -s "section name" flag is for "referencing sections without an actual reference" -- it designates the "root" section(s) from which the whole recursive "search for references" process starts.

ASMotor's smart linking feature works by doing SECTION "Example", ROM0, ROOT, and IIRC we at least mentioned the possibility of allowing something like that in our implementation. But at least so far, this PR only adds a CLI -s "Example" flag. I was question if/how we should allow other ways to specify the root initially referenced sections, in .asm or .link input files.

@ISSOtm
Copy link
Member

ISSOtm commented May 7, 2025

That sounds wrong: the behaviour I'd find intuitive is

  • fully-constrained sections should also be automatically be treated as roots
  • sections mentioned in the linker script are also roots (is there a case where the placement of a section is important, but not its existence?) even if not fully constrained
  • and the CLI could be used to avoid dropping sections that aren't referenced via symbols (but we could invite people to use the linker script instead, using FLOATING to avoid over-constraining the section)

This seems sufficient, as the roots of the Control Flow Graph (CFG, borrowed from compiler literature) are

  • the header
  • the interrupt handlers

...both of which are fully-constrained, and should reference the rest of the graph via symbols.

@Rangi42
Copy link
Contributor Author

Rangi42 commented May 7, 2025

That makes sense to me, thanks.

Any opinion on the two proposed possibilities for how to change asm-time known-constant expressions involving labels, so that those labels still get passed to rgblink and counted as references? (Option 1, option 2.)

@ISSOtm
Copy link
Member

ISSOtm commented May 7, 2025

I'm starting to think that registering that information separately, rather than trying to muck with expressions, is a simpler solution. Notably, it separates that concern from the already-complex expression systems (plural, because there's the “client” part in RGBASM, and the “server” part in RGBLINK); and we can then use a dedicated, more compact format in the object file (for example, adding a [ID of referenced symbols; _] array to section data, which shouldn't be too much data added to object files, at 4 bytes/symbol with easy deduplication by stuffing all of the IDs into a HashSet in RGBASM).

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
enhancement Typically new features; lesser priority than bugs rgblink This affects RGBLINK WIP This PR is a work in progress
Projects
None yet
Development

Successfully merging this pull request may close these issues.

Smart linking
4 participants