Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Create Debian/Ubuntu package dataset #4

Open
nightlark opened this issue Dec 5, 2024 · 1 comment
Open

Create Debian/Ubuntu package dataset #4

nightlark opened this issue Dec 5, 2024 · 1 comment
Assignees

Comments

@nightlark
Copy link
Collaborator

nightlark commented Dec 5, 2024

Create a package dataset for Debian/Ubuntu that maps file names to the package(s) that could have installed the files. Relates to #5 and #8 for determining how file names should be normalized to use as a "key" for lookups. We should also consider how we may want to split up the dataset into smaller chunks based on how it will be used (e.g. only includes, only binary files, etc).

Some potential sources of data for this are:

  • SQLite database from linux-package-analyzer
  • Contents-amd64 gzipped file from http://security.ubuntu.com/ubuntu/dists/ for noble or oracular (sources also may have interesting info to add); Debian equivalent is https://ftp.debian.org/debian/dists/stable-updates/main/
    • The package name given isn't the best one (usually contains version information in its name) -- in general the source package name is a better option that is more recognizable; though there are exceptions like libstdc++-12-dev comes from the gcc-12 source package (a few special cases for some well-known libraries like that may be needed.
@monwen monwen self-assigned this Dec 5, 2024
@nightlark
Copy link
Collaborator Author

Update/write script to output results as a sqlite database with an index generated on the file name for fast lookups.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

When branches are created from issues, their pull requests are automatically linked.

3 participants