Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Improve UX and Performance of Install step #12712

Open
1 task done
notatallshaw opened this issue May 19, 2024 · 6 comments
Open
1 task done

Improve UX and Performance of Install step #12712

notatallshaw opened this issue May 19, 2024 · 6 comments
Labels
type: performance Commands take too long to run UX User experience related

Comments

@notatallshaw
Copy link
Contributor

notatallshaw commented May 19, 2024

What's the problem this feature will solve?

At the moment when the final install step starts pip gives no output what it is doing, in some real world cases (e.g. large pytorch installations or airflow installs) this steps can take over 30 seconds on fast machines, so minutes on slow machines. The user is left wondering if anything is happening.

Describe the solution you'd like

I would like to see the following improvements:

  1. Log a message that pip is starting to install packages
  2. Present a progress bar that tracks the number of packages installed out of the total packages to be installed
  3. Improve any obvious performance bottlenecks (see follow up post with profile)
  4. Run installs in parallel (made seperate issue Install packages in parallel #12742)

Alternative Solutions

I think at a bare minimum there should be a log message that lets the user know what's happening.

Additional context

uv runs installs in parallel, and following their issue tracker it does not appear to be problematic, to do this a cli option to control the maximum number would need to be added, the same as how there is a PR for parallel downloads to do this.

Code of Conduct

@notatallshaw notatallshaw added S: needs triage Issues/PRs that need to be triaged type: feature request Request for a new feature labels May 19, 2024
@notatallshaw
Copy link
Contributor Author

notatallshaw commented May 19, 2024

This scenario is artifically constructed to best profile the installer code by removing the need to download, build sdists, or resolve:

  1. python3.12 -m venv .venv
  2. source .venv/bin/activate
  3. <install latest/dev pip>
  4. wget https://raw.githubusercontent.com/apache/airflow/constraints-2.9.1/constraints-3.12.txt
  5. python -m pip download -d downloads -r constraints-3.12.txt
  6. cd downloads
  7. for file in $(ls *.tar.gz); do pip wheel --no-deps "$file" && mv "$file" "$file".built ; done
  8. for file in $(ls *.zip); do pip wheel --no-deps "$file" && mv "$file" "$file".built ; done
  9. cd -
  10. python -m pip install --only-binary=:all: --no-index --ignore-installed --no-deps --find-links file://${PWD}/downloads -r constraints-3.12.txt

I ran with and without --dry-run to see the timing difference:
Dry Run: 32s
Regular Install: 144s

I profiled with and without --dry-run to see the profile difference:

Dry Run Profile

airflow-no-deps-dry-run-install

Regular Install Profile

airflow-no-deps-install

There are some clear hotspots here, I will take a look when I have time if there are some easy ways to reduce those hotspots if no one else does.

@ichard26 ichard26 added type: performance Commands take too long to run and removed type: feature request Request for a new feature S: needs triage Issues/PRs that need to be triaged labels May 19, 2024
@ichard26
Copy link
Member

The get_dist_name() hot spot should be vastly improved by #12656 FWIW. I scheduled the PR for 24.2 as it feels a bit risky to ship in 24.1 final. Please say something if anyone feels differently.

@pfmoore
Copy link
Member

pfmoore commented May 19, 2024

I see no issues with the UI proposal, but I'd want parallel installs to be a separate feature. I can imagine pathological cases where things could break when installing in parallel, and while the experience of uv is encouraging (as is the fact that normal cases are clearly safe) my instinct is that every pathological case is being exercised by some user of pip, somewhere. So we should isolate the risk here by making it a separate feature.

@ichard26 ichard26 added the UX User experience related label May 19, 2024
@notatallshaw
Copy link
Contributor Author

notatallshaw commented May 19, 2024

The get_dist_name() hot spot should be vastly improved by #12656 FWIW. I scheduled the PR for 24.2 as it feels a bit risky to ship in 24.1 final. Please say something if anyone feels differently.

Great, I'll reprofile with this PR. I personally wasn't imagining any of these ideas would land for 24.1.

I'd want parallel installs to be a separate feature

Agree, I'll make a seperate issue for that.

Honestly, the others I feel like I could make PRs that safely improve pip, I'm unsure about parralel installs, I think it would at a minimum carefully need to look at what current multiple installs tests there are and potentially expanding them to have a good matrix of different possibilities.

@notatallshaw
Copy link
Contributor Author

notatallshaw commented Jun 2, 2024

Log a message that pip is starting to install packages

Btw, I was looking at this recently because I noticed pip does tell you it's installing packages. The specific scenario I was seeing was the following:

  1. You install a large number of packages
  2. You then install a large number of semi-overlapping packages

On step two this produces the following behavior:

  1. Packages are resolved and pip tells you what packages it is going to install
  2. Pip then quickly uninstalls old packages, filling up the screen
  3. There is a long wait with no update on the screen while pip is installing
  4. Pip then lists all packages it installed

The real world situaiton this happens is installing large machine learning packages, particularly because you install a bunch of packages from the pytorch index, and then install a bunch of packages from pypi.

I think there are a couple of possible solutions:

  1. Re-order or add additional messages, e.g. move or add and "install" message after the uninstalls have completed
  2. Add progress bars to both uninstalling packages and installing packages, so it's clear pip is doing things

I will take a look at PRs when I have a chance.

@ichard26
Copy link
Member

ichard26 commented Jun 6, 2024

Caching the result of utils.compatibility_tags.get_supported() in the resolver factory should be another easy win1 (~3% or 4s in the example above)

return self._wheel_cache.get_cache_entry(
link=link,
package_name=name,
supported_tags=get_supported(),
)

I'll submit a PR when I get the chance.

Footnotes

  1. I strongly suspect that get_supported() is only "slow" (as in, 1-5ms) on Linux due to the large amount of supported tags per system.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
type: performance Commands take too long to run UX User experience related
Projects
None yet
Development

No branches or pull requests

3 participants