Skip to content

Data about packages and maintainers on PyPI

License

Notifications You must be signed in to change notification settings

sethmlarson/pypi-data

Folders and files

NameName
Last commit message
Last commit date

Latest commit

9e55a8e · Mar 2, 2025
Oct 31, 2023
Mar 6, 2022
Jan 21, 2021
Mar 6, 2022
Nov 13, 2020
Jul 10, 2024
Mar 2, 2025
Sep 29, 2022
Sep 29, 2022
Sep 29, 2022

Repository files navigation

PyPI Data

Mostly up-to-date data about almost every package on PyPI

Get access to the database via GitHub releases.

$ gunzip pypi.db.gz
$ sqlite3 'pypi.db' 'SELECT * FROM packages LIMIT 10 OFFSET 1000;'

acid-vault|1.3.2|>=3.6|1|0|2021-01-21 04:37:10
acidcli|1.0.1|>=3.6|0|0|2021-01-21 04:37:10
acidfile|1.2.1||0|0|2021-01-21 04:37:10
acidfs|1||0|0|2021-01-21 04:37:10
acidoseq|1.3.7||0|0|2021-01-21 04:37:10
acinonyx|0.1.0|>=3.6.0|0|0|2021-01-21 04:37:10
aciops|2.0.0|>=3.6|0|0|2021-01-21 04:37:10
acitoolkit|0.4||0|0|2021-01-21 04:37:10
ackeras|0.1.1||0|0|2021-01-21 04:37:10
ackg|0.0.5||0|0|2021-01-21 04:37:10

Data being tracked

  • Name, Version, Upload Time
  • Direct requirements (requires_dist)
  • Extra requirements (requires_extras)
  • Python requirements (requires_python)
  • Trove classifiers (classifiers)
  • Yanked versions (yanked)
  • Wheel data (build_tag, python_tags, abi_tags, platform_tags)
  • Maintainers on PyPI
  • URLs used by packages
  • OpenSSF scorecard data
  • Google Assured OSS

Database Schemas

-- Packages --
CREATE TABLE packages (
    name STRING,
    version STRING,
    requires_python STRING,
    yanked BOOLEAN DEFAULT FALSE,
    has_binary_wheel BOOLEAN,
    has_vulnerabilities BOOLEAN,
    first_uploaded_at TIMESTAMP,
    last_uploaded_at TIMESTAMP,
    recorded_at TIMESTAMP DEFAULT CURRENT_TIMESTAMP,
    downloads INTEGER,
    scorecard_overall FLOAT,
    in_google_assured_oss BOOLEAN,
    PRIMARY KEY (name)
);

-- Dependencies --
CREATE TABLE deps (
    package_name STRING,
    extra STRING DEFAULT NULL,
    dep_name STRING,
    dep_specifier STRING,
    PRIMARY KEY (package_name, dep_name, dep_specifier)
);

-- Wheel data --
CREATE TABLE wheels (
    package_name STRING,
    filename STRING,
    build STRING,
    python STRING,
    abi STRING,
    platform STRING,
    uploaded_at TIMESTAMP,
    PRIMARY KEY (package_name, filename)
);

-- Maintainer data --
CREATE TABLE maintainers (
    name STRING,
    package_name STRING
);

-- Package URLs --
CREATE TABLE package_urls (
    package_name STRING,
    name STRING,
    url STRING,
    public_suffix STRING
)

-- OpenSSF Scorecard --
CREATE TABLE scorecard_checks (
    package_name STRING,
    name STRING,
    score INTEGER
)

-- Trove Classifiers --
CREATE TABLE classifiers (
    package_name TEXT,
    name TEXT,
    PRIMARY KEY (package_name, name),
    FOREIGN KEY (package_name) REFERENCES packages(name)
)

Download data

Downloads are grabbed from https://github.com/hugovk/top-pypi-packages but only available for the top 5,000 packages.

Running locally

$ docker build -t pypi-data .
$ docker run --rm pypi-data

License

Apache-2.0