Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Improve data source importing #17

Open
JuxhinDB opened this issue Sep 19, 2023 · 0 comments
Open

Improve data source importing #17

JuxhinDB opened this issue Sep 19, 2023 · 0 comments
Labels
good first issue Good for newcomers help wanted Extra attention is needed

Comments

@JuxhinDB
Copy link
Member

Currently the NIST importing functionality is too slow, often taking many hours to import the dataset. Taking a look into the codebase it looks like where spawning multiple database transactions in order to import a single entry:

https://github.com/Exein-io/kepler/blob/558afe222b3c21c72a66d26ea1e93695d2c3751c/kepler/src/main.rs#L146-L187

Since a lot of these entries are completely independent of each other we should batch insert them into the database in a single transaction (even packing 1000s of CVEs at a time).

INSERT INTO cves (columns)
VALUES
    (cve_1),
    (cve_2),
    ...
    (cve_n)
RETURNING * 

Which will result in a single BEGIN/COMMIT per chunk rather than multiple per-CVE. The relational properties are still held within the transaction itself.

@JuxhinDB JuxhinDB added good first issue Good for newcomers help wanted Extra attention is needed labels Sep 19, 2023
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
good first issue Good for newcomers help wanted Extra attention is needed
Projects
None yet
Development

No branches or pull requests

1 participant