Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

High level flow diagram #4

Open
pombredanne opened this issue Feb 9, 2024 · 3 comments
Open

High level flow diagram #4

pombredanne opened this issue Feb 9, 2024 · 3 comments

Comments

@pombredanne
Copy link
Member

pombredanne commented Feb 9, 2024

The attached diagram presents some high level flow of the various parts: VCIO, PurlDB, FederatedCode

federatedcode-flow
federatedcode-flow.odp

@pombredanne
Copy link
Member Author

pombredanne commented Nov 20, 2024

Here are the step we are going through at a high level

Process to populate Git repos

  1. In VCIO
  • command line to export from VCIO
  • outcome: vulnerabilities and minimal package data saved on disk as YAML and commit the results in the backing git repo(s) then push back
  1. In SCIO
  • add-on pipeline to export a project output in SCIO
  • outcome: details package data saved on disk as YAML and commit the results in the backing git repo(s), then push back

Process to advertise Package and Vulnerability data:

  1. In FederatedCode
  • command line to "sync" the backing git repos

    • For Vulnerabilities
    • For Packages
  • outcome: repos are cloned locally and the data pulled in the FederatedCode database, ideally only looking at changes and data pointers, not the whole data

  1. In FederatedCode
  • command line to "federate" the data

    • For Vulnerabilities
    • For Packages
  • outcome: new and updated data is advertised in the Fediverse for the stream of events impacting each package

Process to retrieve Package and Vulnerability data in PULL mode:

  1. In a PurlDB instance
  • command line to retrieve "federated" data from FederatedCode, possibly for a single PURL (or possibly as an API on demand endpoint?)
  • outcome:
    • the summary data is pulled straight from FederatedCode git repos, using PURL as a key
    • the scans details are fetched from FederatedCode git repos
    • the scans are imported in the DB and packages created
  1. In a VCIO instance
  • command line to retrieve "federated" data from FederatedCode, possibly for a single PURL (or possibly as an API on demand endpoint?)
  • outcome:
    • the summary data is pulled straight from FederatedCode git repos, using PURL as a key
    • the vulnerable package versions and vulnerability details are fetched from FederatedCode git repos (And in the future also advisories?)
    • the data are imported in the DB and vulnerability with packages and their relationships created

Process for PurlDB and VCIO to obtain federated Package and Vulnerability data in PUSH mode:

We will need first a process for PurlDB and VCIO to subscribe to federated Package and Vulnerability data. Then once this is done we should get federated messages processed as explained below.

  1. In a PurlDB instance
  • command line or API to subscribe to "federated" data from FederatedCode

  • command line to receive "federated" data from FederatedCode or endpoint that is part of the fediverse that can receive activitypub messages. This is only for the packages for which we have subscribed in FederatedCode

  • outcome:

    • the summary activitypub data is received as PUSHED from FederatedCode

The data are updated as in PULL mode:

  • the summary data is pulled straight from FederatedCode git repos, using PURL as a key
  • the scans details are fetched from FederatedCode git repos
  • the scans are imported in the DB and packages created
  1. In a VCIO instance
  • command line or API to subscribe to "federated" data from FederatedCode

  • command line to receive "federated" data from FederatedCode or endpoint that is part of the fediverse that can receive activitypub messages. This is only for the packages for which we have subscribed in FederatedCode

  • outcome:

    • the summary data is received from FederatedCode

The data are updated as in PULL mode:

  • the summary data is pulled straight from FederatedCode git repos, using PURL as a key
  • the vulnerable package versions and vulnerability details are fetched from FederatedCode git repos (And in the future also advisories?)
  • the data are imported in the DB and vulnerability with packages and their relationships created

Process to "curate" data:

A design is to consider VCIO curations as advisories made by a person or org, then we eventually have many advisories from actual VCIO data sources and other "federated" advisories. This will demand some WIP changes on VCIO models to happen.

  • Curate vulnerabilities in FederatedCode and VCIO
  • Curate packages in FederatedCode and PurlDB

@pombredanne
Copy link
Member Author

@ziadhany fyi, we need to make this set of flows clear so we can write the doc. Let's chat

@ziadhany
Copy link
Collaborator

ziadhany commented Nov 21, 2024

Sure, @pombredanne let's have a chat and finalize the flows

Process to advertize Package and Vulnerability data:

1. In FederatedCode


* command line to "sync" the backing git repos
  
  * [x]  For Vulnerabilities
  * [ ]  For Packages

* outcome: repos are cloned locally and the data pulled in the database, only looking at changes


2. In FederatedCode


* command line to "federate" the data
  
  * [x]  For Vulnerabilities
  * [ ]  For Packages

For Packages in VCIO: We had this before but we changed the file structure slightly. I think we need thorough testing to catch any bugs, especially in the importer ( sync ) . Additionally, we should ensure robust testing for the federate functionality to avoid issues when federating messages.

Process to retrieve Package and Vulnerability data in PULL mode:

1. In PurlDB


* [ ]  command line to retrieve "federated" data from FederatedCode, possibly for a single PURL (or possibly as an API on demand endpoint?)

* outcome:
  
  * the summary data is pulled from FederatedCode
  * the scans details are fetched from backing git repos
  * the scans are imported in the DB and packages created


2. In VCIO


* [ ]  command line to retrieve "federated" data from FederatedCode, possibly for a single PURL (or possibly as an API on demand endpoint?)

* outcome:
  
  * the summary data is pulled from FederatedCode
  * the vulnerable package and vulnerability details are fetched from backing git repos (what about advisories?)
  * the data are imported in the DB and vulnerability and packages and relationship created

We have an endpoint for this (sync) /repository/{repo-id}/sync-repo/. Click on 'sync' POST Form request. This endpoint pulls the Git repository data, then runs the importer script, which fetches the diff and processes only the diff (the unprocessed commits). then It creates the vulnerability and package relations. However, we need to create a test to catch any bugs, and we need to double-check the relations we want to store for VCIO. We also need to determine what we will store for SCIO/PurlDB

Process for PurlDB and VCIO to obtain federated Package and Vulnerability data in PUSH mode:

We will need first a process for PurlDB and VCIO to subscribe to federated Package and Vulnerability data. Then once this is done we should get federated messages processed as explained below.

1. In PurlDB


* [ ]  command line to receive "federated" data from FederatedCode
  or endpoint that is part of the fediverse that can receive activitypub messages

* outcome:
  
  * the summary data is received from FederatedCode
  * the data are updated as in PULL mode


2. In VCIO


* [ ]  command line to receive "federated" data from FederatedCode
  or endpoint that is part of the fediverse that can receive activitypub messages

* outcome:
  
  * the summary data is received from FederatedCode
  * the data are updated as in PULL mode

I think we should have an endpoint in VCIO and PurlDB that updates the vulnerability or package after it is reviewed and accepted on FederatedCode. Then, VCIO will push the changes to the Git repo, and FederatedCode will sync the repo and update the relation.

Process to "curate" data:

* [ ]  Curate vulnerabilities in FederatedCode and VCIO

* [ ]  Curate packages in FederatedCode and PurlDB

I'm not sure about this, but it depends on many factors. Should we rely on the FederatedCode review or the GitHub repo review (pull request) and treat Git as the source of truth? I was thinking we could have both mechanisms. For example, if we create a review in FederatedCode, it could trigger one in GitHub. However, I believe this approach might lead to issues, such as message sync problems and merge conflicts. I think it might be better to rely on just one and set up a GitHub action/trigger on merge.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

No branches or pull requests

2 participants