Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Accelerating scc startup speed with code generation #594

Open
wants to merge 1 commit into
base: master
Choose a base branch
from

Conversation

apocelipes
Copy link
Contributor

We skip the JSON serialization and convert the data directly into Go code, which saves considerable startup time.

"scc --languages" is now 40% faster:
image

Running on a small code base can gain 7% improvements and on big code bases it could be 1~3%:
small

big

Pros:

  • Code changes are more readable
  • Faster startup
  • Human readable data (Go code is more readable than base64 encoded JSON)
  • Save memory: Language data has a large number of repeating string constants, which can be optimized by the compiler.

Cons:

  • It may be horrible to see a 12k lines code file, even it is generated by tools.
  • To generate code, "scripts/include.go" uses a few tricks, such as copying the structure definitions from “processor/structs.go”, which makes the code not particularly elegant.
  • The size of the binary file was increased by 200kb.
  • There is a performance degradation when languageDatabase is not required, because now the language data will always be loaded at startup. For example scc --help is 10% slower than before. However, there are only very few scenarios that don't require language data.

If you prefer compressed JSON data, that's fine. This code generator is only my weekend DIY :)

@boytertesting boytertesting bot added VH/complexity Very high complexity XL/size Extra large change labels Feb 22, 2025
@boyter
Copy link
Owner

boyter commented Feb 23, 2025

Yes the slower to run startup due to the lack of the lazy loading is one of those things I wanted to check. Ill have a look through this because its something I have wanted for a while, especially since it removes the need for 3rd parties to call the process constants method.

@apocelipes
Copy link
Contributor Author

Yes the slower to run startup due to the lack of the lazy loading is one of those things I wanted to check. Ill have a look through this because its something I have wanted for a while, especially since it removes the need for 3rd parties to call the process constants method.

Yes. And if you want to dig in, then you may take a closer look at the two files in the "scripts" directory. It's a simple code generator based on golang's template engine, which is enough for string and bool fields.

@apocelipes apocelipes force-pushed the code-generate branch 2 times, most recently from 98e2495 to 335d2ff Compare March 6, 2025 06:23
@apocelipes
Copy link
Contributor Author

@boyter
I did a little clean up. We can pass strconv.Quote to the template so no need to create the function formatLanguage. We can also directly use processor.Language so no need to copy the structs any more. This makes the "include.go" code much simpler and does not violate the DRY principle.

@boyter
Copy link
Owner

boyter commented Mar 6, 2025

Neat. Being lazy paid off for me then. Have been slammed at work so have not had a chance to look through this yet. When I get a moment ill run it through the benchmarks to ensure it covers all bases and then if all good merge.

@apocelipes
Copy link
Contributor Author

Rebased.

@boyter
Copy link
Owner

boyter commented Mar 26, 2025

Ah thanks for that. I will be looking at this in depth next week. Just no time this week sorry.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
VH/complexity Very high complexity XL/size Extra large change
Projects
None yet
Development

Successfully merging this pull request may close these issues.

2 participants