Cache repeated string instances in the lexer (.NET 9) #38
Labels
area: analysis
Issues related to language analyses.
state: approved
Enhancements and tasks that have been approved.
Milestone
When lexing a typical source file, there's going to be a lot of repeated strings - identifiers, literals, white space, and so on. We can't intern these, but it would make good sense to cache tokens up to a certain length and return the same instance instead of building them up repeatedly.
To implement this, instead of building up the token string in a
StringBuilder
, we would keep track of where the token starts and ends. When creating the token, if the length is below our caching threshold, we first look it up in the token cache. For larger tokens, we shouldn't bother as the lookup will take too long to be worth it.The text was updated successfully, but these errors were encountered: