-
-
Notifications
You must be signed in to change notification settings - Fork 109
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
feat: cache common StringDoc
types
#1494
base: main
Are you sure you want to change the base?
feat: cache common StringDoc
types
#1494
Conversation
e85ecfd
to
bd7424d
Compare
When I saw the title of this PR I was picturing the code making use of something like
I would be a bit worried it would cause problems but if you format everything within Maybe it is better to include the common symbols like you mentioned and keep it as a simple switch.
I did some poking around and it looks like When debugging though, it ends up in an interal class I also noticed that |
Do you think it's worth only using So it looks like roslyn is smart, and when it parses a new symbol or token ie an identifier, it will use an internal cache called a For example calling Like you, I had also assumed that
This is definitely worth exploring. Should I close this PR in favour of one caching all
Yeah microsoft loves using these 😅 I did wonder if
I don't think this is currently worth doing, after #1491 csharpier doesn't construct many |
Oh yeah, csharpier should definitely be using Update:
This won't be as easy I'd hoped 😅 some syntax have multiple Im guessing the solution is to check if a file is associated with a special case and insert the relevant It might also be worth adding a EditDid some multi cursor magic and created a giant switch statement for all known |
bd7424d
to
86bcf24
Compare
86bcf24
to
6eb5c42
Compare
6eb5c42
to
a12e823
Compare
53671cd
to
65dd14a
Compare
80d2d27
to
9c58e07
Compare
9c58e07
to
fdfe56b
Compare
I agree, this should be perfect for our needs here.
On second thought, even if I've overlooked an issue, it would probably take millions of LOC to cause an issue. 🙏
I suspect it was just a series of freak results, who knows perhaps its accurate 🤷 New changes saves 3.2 MB on the I've since deleted the original benchmarks but it looks like EDIT: pretty sure this has a thread safety issue. Different I fixed this issue by using Benchmarks (Done on battery power, timing is 100% random)Main
String Doc
StringDoc WeakTable
|
a96ecba
to
a424d05
Compare
a424d05
to
582d6c2
Compare
Benchmarks (more accurate this time)Before
Before 10x
After
After 10x
|
37292c0
to
c1256e5
Compare
c1256e5
to
64d68b3
Compare
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Gonna test this out on a couple of the large repos, assuming that doesn't reveal any problems I'll merge it
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
My initial testing shows at least one issue - there may just be a bad mapping that is turning {{
into {
and }}
into }
I'm a little suprised the tests didn't catch this one 🤔 I guess roslyn special cases this I had a similar issue with |
a53c4d2
to
42ab7ab
Compare
Fixed this with a I hope roslyn doesn't do this with other Kinds. Do you want me to update |
Add
StringDoc.Create
to reuse commonStringDoc
types where possible. Saving 1.7 MB, 2% of memory usage.Most common
StringDoc
types (total 97,000)I did most of the common single character symbols, it might be worth adding other common symbols like
int
,string
,true
,var
,new
etc. The switch statements of this size are lowered into an efficient hash/binary tree so I doubt it makes a meaningful performance impact.Is it worth exploring caching all new
StringDoc
types? There are a total of 674 uniqueStringDoc
symbols in the example benchmark, I wonder how many unique symbols are in a real world 1M loc project. Perhaps a cache for each file or using a staticConditionalWeakTable
.I can see this causing a memory leak if implemented incorrectly😅
A future change could look into creating a
SpanDoc
type that containsSyntaxToken.Span
, this is used to copy the relevant section of the original code into the new text without allocating.