Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

fix: separate golang license caches from mod dir #2852

Merged
merged 9 commits into from
Jun 12, 2024

Conversation

kzantow
Copy link
Contributor

@kzantow kzantow commented May 7, 2024

Previously, license handling in Golang had a few quirks and may not function if the go mod directory did not exist.

This PR makes the following changes:

  • Separate local go mod cache directory from remote download cache
  • Use known cache directory using XDG if not specified in config; create the directory if it does not exist
  • No longer download files into a go mod directory, only the syft cache

TODO:

  • ensure this cache directory is excluded during directory scans (e.g. syft /)
  • ascertain if we need a syft cache command to view/manage the cache as part of this PR (answer: no, we are caching significantly less data than syft was previously)
  • add more tests
  • remove writable fileresolver usage and just directly use fs.FS implementations

Fixes: #2798 #1933

Copy link
Contributor

@wagoodman wagoodman left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This is great! I feel like this is something we can build on over time too.

Since this PR doesn't introduce mechanisms for cache invalidation, users will start to have new cache directories that will continue to grow (this is different than grype, where new DBs get replaced).

I feel like before merging this there should be an answer to how a user would delete the cache or how syft would automatically do this... this could be:

  • like you brought up on the call, allow for TTL configuration and track in the cache dir track last write date so we can clean up periodically.
  • maybe add a cache list and cache delete subcommands to syft

@kzantow kzantow force-pushed the fix/go-licenses-without-mod-dir branch from 2943b4e to 02b13e9 Compare June 11, 2024 18:03
…s originated, cache resolver includes version, fix some git download cases

Signed-off-by: Keith Zantow <[email protected]>
Copy link
Contributor

@wagoodman wagoodman left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This cache foundation is awesome ❤️ 🚀

@kzantow kzantow merged commit ca0cc52 into anchore:main Jun 12, 2024
11 checks passed
@kzantow kzantow deleted the fix/go-licenses-without-mod-dir branch June 12, 2024 23:12
@Joerki
Copy link

Joerki commented Jun 14, 2024

Hi @wagoodman, @kzantow

This is great! I feel like this is something we can build on over time too.

Since this PR doesn't introduce mechanisms for cache invalidation, users will start to have new cache directories that will continue to grow (this is different than grype, where new DBs get replaced).

I feel like before merging this there should be an answer to how a user would delete the cache or how syft would automatically do this... this could be:

  • like you brought up on the call, allow for TTL configuration and track in the cache dir track last write date so we can clean up periodically.
  • maybe add a cache list and cache delete subcommands to syft

Thanks for the info,
FYI: Good to know. In my case I don't need to do an explicit clean-up step, since my CI pipeline agent throws away all local data after doing its jobs and becomes unused for some minutes.

For other users it might be good to have a choice

  • to discard the cache (at the end of execution "I really don't want to have it anymore")
  • to keep it after execution (e.g. as default, "I want to use possible cached modules and collect further on")
  • or to reset/remove it (at the beginning of execution, "I want to forget cached modules up to now, but want consider cached modules in the future until my next reset")
    during certain build steps.
    I might prefer a command line option or config setting that set one of possible option above.
    With a single command (cache delete) I see a possible, unwanted effect when
  • e.g. a pipeline stops because of a failing step
  • or a final cleanup step is skipped or gets forgotton

BR,
Jörg

@kzantow
Copy link
Contributor Author

kzantow commented Jun 14, 2024

Hey @Joerki -- we are going to improve the configuration in the near future. The current configuration options allows you to achieve the different options you mentioned. There is a cache directory (SYFT_CACHE_DIR) configuration option (you can always see this now using syft config). So you can give it an explicit dir that gets cleaned up or cached between CI runs or a temporary directory that gets deleted. Or you can just cache/delete the default syft cache directory.

There is also a cache TTL, specifying the time-to-live of cache items, but this does not delete them currently. We plan to add a syft cache purge or similar command to do this explicitly.

And there's an obvious CACHE_ENABLED we could add; I'll get that done right away :) EDIT: I chose not to add this, since the different behavior options can be covered by logical values in the Dir / TTL configuration and we may want to add different types, such as remote cache.

spiffcs added a commit that referenced this pull request Jun 14, 2024
* main:
  chore(deps): update tools to latest versions (#2961)
  chore(deps): bump github/codeql-action from 3.25.9 to 3.25.10 (#2964)
  feat: index known CPEs for wordpress plugins and themes (#2963)
  fix(golang): improve version extraction from ldflags for pingcap TiDB (#2962)
  chore(deps): bump actions/checkout from 4.1.6 to 4.1.7 (#2955)
  chore(deps): bump github/codeql-action from 3.25.8 to 3.25.9 (#2956)
  fix: separate golang license caches from mod dir (#2852)
  chore(deps): bump github.com/vbatts/go-mtree from 0.5.3 to 0.5.4 (#2952)
  chore(deps): update tools to latest versions (#2949)
  chore(deps): bump modernc.org/sqlite from 1.30.0 to 1.30.1 (#2950)
spiffcs added a commit that referenced this pull request Jun 14, 2024
* main:
  chore(deps): update tools to latest versions (#2961)
  chore(deps): bump github/codeql-action from 3.25.9 to 3.25.10 (#2964)
  feat: index known CPEs for wordpress plugins and themes (#2963)
  fix(golang): improve version extraction from ldflags for pingcap TiDB (#2962)
  chore(deps): bump actions/checkout from 4.1.6 to 4.1.7 (#2955)
  chore(deps): bump github/codeql-action from 3.25.8 to 3.25.9 (#2956)
  fix: separate golang license caches from mod dir (#2852)
  chore(deps): bump github.com/vbatts/go-mtree from 0.5.3 to 0.5.4 (#2952)
  chore(deps): update tools to latest versions (#2949)
  chore(deps): bump modernc.org/sqlite from 1.30.0 to 1.30.1 (#2950)

Signed-off-by: Christopher Phillips <[email protected]>
@kzantow
Copy link
Contributor Author

kzantow commented Jun 14, 2024

@Joerki I made another PR to make a couple configuration options work in a more expected manner and be more explicit to support the cases you've mentioned: #2966

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

Golang: Search remote licenses not working in a CI pipeline when scanning Docker image
3 participants