Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Cache repeated string instances in the lexer (.NET 9) #38

Open
alexrp opened this issue Feb 22, 2023 · 2 comments
Open

Cache repeated string instances in the lexer (.NET 9) #38

alexrp opened this issue Feb 22, 2023 · 2 comments
Labels
area: analysis Issues related to language analyses. state: approved Enhancements and tasks that have been approved.
Milestone

Comments

@alexrp
Copy link
Member

alexrp commented Feb 22, 2023

When lexing a typical source file, there's going to be a lot of repeated strings - identifiers, literals, white space, and so on. We can't intern these, but it would make good sense to cache tokens up to a certain length and return the same instance instead of building them up repeatedly.

To implement this, instead of building up the token string in a StringBuilder, we would keep track of where the token starts and ends. When creating the token, if the length is below our caching threshold, we first look it up in the token cache. For larger tokens, we shouldn't bother as the lookup will take too long to be worth it.

@alexrp alexrp added state: approved Enhancements and tasks that have been approved. type: performance area: analysis Issues related to language analyses. labels Feb 22, 2023
@alexrp alexrp added this to the v2.0 milestone Feb 22, 2023
@alexrp alexrp self-assigned this Feb 22, 2023
@alexrp
Copy link
Member Author

alexrp commented Mar 18, 2023

Along with this work, we should also create lexed strings through SourceText.ToString(SourceTextSpan).

@alexrp alexrp modified the milestones: v2.0, Future Jan 1, 2024
@alexrp alexrp removed their assignment Jan 27, 2024
@alexrp
Copy link
Member Author

alexrp commented Jul 12, 2024

dotnet/runtime#27229 should make this quite a bit easier to implement in .NET 9.

@alexrp alexrp changed the title Cache repeated string instances in the lexer Cache repeated string instances in the lexer (.NET 9) Jul 12, 2024
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
area: analysis Issues related to language analyses. state: approved Enhancements and tasks that have been approved.
Development

No branches or pull requests

1 participant