Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Add repo dependency ingester #5030

Draft
wants to merge 1 commit into
base: main
Choose a base branch
from

Conversation

evankanderson
Copy link
Member

Summary

Add a dedicate ingestion mode for repository dependencies which uses https://github.com/google/osv-scalibr to extract the dependencies from various lockfile and other dependency files in the repository.

See this design doc for the long term direction on dependency ingest; this is part of "stage 1" (but needs tests)

Change Type

Mark the type of change your PR introduces:

  • Bug fix (resolves an issue without affecting existing features)
  • Feature (adds new functionality without breaking changes)
  • Breaking change (may impact existing functionalities or require documentation updates)
  • Documentation (updates or additions to documentation)
  • Refactoring or test improvements (no bug fixes or new functionality)

Testing

Still work in progress... Feel free to pick this up and patch it in while I am out.

Review Checklist:

  • Reviewed my own code for quality and clarity.
  • Added comments to complex or tricky code sections.
  • Updated any affected documentation.
  • Included tests that validate the fix or feature.
  • Checked that related changes are merged.

@coveralls
Copy link

coveralls commented Nov 23, 2024

Coverage Status

coverage: 54.266% (-0.3%) from 54.525%
when pulling c09e0a5 on evankanderson:deps-ingest
into a3fbb21 on mindersec:main.

Identifiers: map[int32]string{
int32(sbom.SoftwareIdentifierType_PURL): inv.Extractor.ToPURL(inv).String(),
// TODO: scalibr returns a _list_ of CPEs, but protobom will store one.
// use the first?
Copy link
Contributor

@puerco puerco Nov 23, 2024

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Ideally the most specific one. Now that package URL has official support for ranges, the one-purl-per-package rule is now obsolete. This means that, perhaps, we should create a new identifier list to capture more than one.

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I'm slightly confused about what you're suggesting here. I can see why PURLs supporting ranges would be useful in general, but I'm not sure if you're suggesting:

  1. That upstream protobom needs a change in the identifiers type.
  2. That Scalibr needs a change in the ToPURL function.
  3. That this code should be parsing the purl / Inventory and creating multiple Node objects when the Inventory contains certain types of data.

Looking at the scalibr code, it looks like CPEs are only reported from the SPDX and CDX extractors, so multiple (or > 0) CPEs seems moot for our current use cases. :puzzled:

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I meant number 1, but there are use cases when you may also want to return purls with ranges, more than one purl, and/or more than one CPE. It all depends on what you are trying to match.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

3 participants