Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

feat: add argument to specify the number of jobs being used for interfile #10230

Conversation

emjin
Copy link
Collaborator

@emjin emjin commented May 8, 2024

In combination with changes to semgrep-proprietary, this produces a speedup for interfile runs that use both intrafile and interfile rules. Instead of running both the intrafile and interfile rules with -j 1, it runs first the intrafile rules with the default number of jobs, and then the interfile rules with a single job. This conserves memory for the interfile rules while allowing the intrafile rules to benefit from multiprocessing.

TODO: we can remove the supply chain hack that disables interfile when possible to get multiprocessing

Test plan: see corresponding semgrep-proprietary PR

…file

In combination with changes to semgrep-proprietary, this produces a speedup for
interfile runs that use both intrafile and interfile rules. Instead of running
both the intrafile and interfile rules with `-j 1`, it runs first the intrafile
rules with the default number of jobs, and then the interfile rules with a single
job. This conserves memory for the interfile rules while allowing the intrafile
rules to benefit from multiprocessing.

TODO: we can remove the supply chain hack that disables interfile when possible
to get multiprocessing

Test plan: see corresponding semgrep-proprietary PR
Copy link
Contributor

github-actions bot commented May 8, 2024

PR checklist:

  • Purpose of the code is evident to future readers
  • Tests included or PR comment includes a reproducible test plan
  • Documentation is up-to-date
  • A changelog entry was added to changelog.d for any user-facing change
  • Change has no security implications (otherwise, ping security team)

If you're unsure about any of this, please see:

@r2c-argo
Copy link
Contributor

r2c-argo bot commented May 8, 2024

semgrep-compare-github-lkgwa results

Ran benchmark on 38 repositories

The number of findings differs for 4 repos

Whole benchmark is 1.1% slower (a bit of noise is expected)

Relative speed improvement is 0.97 on average

  • 3% of scans are significantly faster
  • 5% of scans are significantly slower

Relative memory improvement is 0.99 on average

Copy link
Collaborator

@aryx aryx left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I doubt this work given you didn't modify osemgrep.
All CLI arguents processing is now by default first done by osemgrep,
so I don't think there is a way right now to call semgrep with --interfile-jobs.
In fact the test plan does not contain such command.

@emjin emjin marked this pull request as draft May 10, 2024 21:42
@aryx aryx closed this May 16, 2024
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Development

Successfully merging this pull request may close these issues.

None yet

2 participants