Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Running out of space on PyPI #2225

Open
15 tasks
wjones127 opened this issue Apr 18, 2024 · 3 comments
Open
15 tasks

Running out of space on PyPI #2225

wjones127 opened this issue Apr 18, 2024 · 3 comments
Assignees

Comments

@wjones127
Copy link
Contributor

wjones127 commented Apr 18, 2024

We have run out of space on PyPI. I have requested additional space (pypi/support#3920), but our release strategy is still unsustainable.

Right now we make releases 1-4 times per week. Each release is 140 MB. This means it takes just 35 weeks to hit the 10GB limit. This issue proposes how we can reduce the frequency while still hitting our other goals with releases.

Monthly stable releases

We can reduce release frequency to about once a month. Keeping the release size the same, that gives us over 5 years to reach 10GB.

Monthly is a rough schedule. We can make stable releases more frequently at times if need arrises, but because of space constraints we should avoid this when not necessary.

Besides size savings, we will also have more time to publicize the release. We currently don't many blog posts publicizing new features or doing a roundup of important changes. With a longer official release cycle, we can spend a couple days or a week writing this.

Frequent beta releases

We benefit a lot by being response to users and getting out fixes and new features fast. It would be a shame if we had to stop that.

One way we can still serve them is by shipping beta wheels. We can host them on https://fury.co/, which is the same service that PyArrow uses.

We necessarily have to build nightly. We can keep triggering releases manually, to keep CI costs down. This way we can keep our total release frequency the same.

Users will then be able to install with --extra-index-url --pre https://pypi.fury.io/some-repo/. The --pre flag tells pip it can install beta release. The --extra-index-url tells pip where to look for additional binaries. The same flags can be added to requirements.txt.

We can also put the last N beta releases on PyPI, so users just have to put the --pre flag to get the latest. There is a script pypi-cleanup that can help us automatically delete older versions. However, it's probably safest to not point users to that, since they might pin to a version that will be deleted.

We should provide the same platform support on betas as we do for regular releases. The better we support betas, the less pressure we will have to make more frequent regular releases.

Call them nightly or beta? In PyPI you can have .dev, .alpha, .beta releases. Maybe .beta will be more appealing than .dev for users? Ref: https://peps.python.org/pep-0440/#pre-releases We could also call them "preview", which is what PyTorch does.

Could we keep X past beta release on PyPI? Then don't need the flag.

Impact on LanceDB

LanceDB is in a transitionary period right now, where it both depends on Lance (for the sync API) and compiles Rust code (for the async API). I think the ideal end goal will be to have LanceDB Python be implemented entirely on Rust. Then we can loosely pin the version of Lance.

Because LanceDB's package size is now on par with Lance, we will likely want to similarly reduce official release frequency of LanceDB.

We could do the version pin in LanceDB one of two ways:

  1. Keep the version pinned to an exact version, including beta specifier. We could upgrade LanceDB to pin to beta versions of Lance in between releases. This requires more work to make a LanceDB release, but less care when dealing with versions in Lance
  2. Loosen the pin to ~=0.MINOR.PATCH. For ~=0.10.16, this would allow 0.10.17.beta1 and 0.10.17, but not 0.11.0. Thus, we can use minor versions when there are breaking changes. This requires that we are careful with versioning, and probably testing patch Lance repo against LanceDB in CI.

For now, changes in Lance will affect LanceDB and need to be tested.

Versioning schema

To support beta releases, we need to change how versioning works:

  • Change main release script to patch increment after release
  • Create triggered job to perform beta release to fury.io
  • Whenever a breaking change is made, require a minor version bump
  • Add a CI job to Lance that runs LanceDB's test suite against current version (Python & Rust)
  • Move LanceDB to pinning with ~=0.MINOR.PATCH
  • Add a version drop down for docs, publish dev documentation

Here's an example sequence of version changes:

v0.10.16 (release)
(increment version)
v0.10.17b1 (nightlies)
v0.10.17b2
v0.10.17b3
v0.10.17 (release)
(increment patch version)
v0.10.18b1 (nightlies)
(breaking change -> increment minor)
v0.11.0b1
v0.11.0b2
v0.11.0 (release)

How will this work for users?

Users who want to just use the monthly stable releases, can just do:

lance==0.10.17
lancedb==0.4.13

Users who want to use the latest releases will be able to do:

-i https://pypi.fury.io/lancedb/ --pre lance==0.10.17
lancedb==0.4.13

This will get them the latest dev version of lance. Because we use the ~= specifier in LanceDB, it will be compatible the current stable version of lancedb.

What about Rust?

Given that Rust is a source-only release, I think it would work fine it we just had users who wanted pre-release use the git specifiers. With the above changes in versioning scheme, this should work well as a LanceDB dependency.

What about Node?

npm doesn't appear to have a size limit. We could just upload beta releases directly to NPM.

Reduce binary size

TODO

Part 1: Lance versioning

Part 2: LanceDB-Lance relationship

  • Add a CI job to Lance that runs LanceDB's test suite against current version
    • Should handle both Python and Rust
    • Should short-circuit if we've bumped the minor version of Lance
  • Move LanceDB to pin with ~=0.MINOR.PATCH

Part 3: LanceDB versioning

Adopt same "stable" / "preview" scheme as Lance.

  • Change main release script to patch increment after release
  • Create a triggered job that will:
    1. Create a beta version tag (for example: 0.1.1.beta1)
    2. Generate a GH release with automatic changelog
    3. Release wheels to fury.io
    4. Release wheels to PyPI
    5. Cleanup old beta wheels
  • Update release process instructions
  • Add beta releases to installation page
    • Provide instructions on how to use with pip, requirements.txt, and others
    • Explain beta releases have newer features and same level as testing as typical release
  • Add linked stable and preview docs
@eddyxu
Copy link
Contributor

eddyxu commented Apr 19, 2024

Generally, these look good directions.

A few feedbacks:

  • Monthly release is too long for this project as today. Can we do weekly (or at least bi-weekly)? i.e. Ray recently changed from monthly to weekly release.
  • We can drop Intel Mac support now.
  • Nightly release can be Linux (x86/arm) only if needed.

@wjones127
Copy link
Contributor Author

Monthly release is too long for this project as today
Nightly release can be Linux (x86/arm) only if needed.

I think this is ultimately a question of how much do we want to rely on stable vs nightly releases. If we make nightly releases something users only do in an emergency, then I agree weekly stable releases would be preferable.

But what I am thinking is we should make nightly releases so easy to use, that users will use them most of the time when developing something new. We will have it in our docs showing how to download development versions. All the same platforms will be supported as in stable releases. And we will test against LanceDB to make sure they are compatible with current stable and nightly LanceDB (unless there is a breaking change).

Either way we go, we should make sure our investment in nightly releases is worth it. If we decide that they are only for rare instances, we should invest as much effort in dev documentation or LanceDB integration testing.

@wjones127
Copy link
Contributor Author

wjones127 commented Apr 22, 2024

TODO:

@wjones127 wjones127 self-assigned this Apr 26, 2024
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

No branches or pull requests

2 participants