Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

tap: memoize loading installed taps #16806

Closed
wants to merge 1 commit into from

Conversation

apainintheneck
Copy link
Contributor

  • Have you followed the guidelines in our Contributing document?
  • Have you checked to ensure there aren't other open Pull Requests for the same change?
  • Have you added an explanation of what your changes do and why you'd like us to include them?
  • Have you written new tests for your changes? Here's an example.
  • Have you successfully run brew style with your changes locally?
  • Have you successfully run brew typecheck with your changes locally?
  • Have you successfully run brew tests with your changes locally?

See #16791 (comment)

This adds a new Tap.tap_registry method that will load all installed and core taps only once instead of doing it every time on the fly. We only expect this list to change when a tap is installed or uninstalled in the lifetime of the program. The only places where that can happen are in the Tap#install and Tap#uninstall methods so we make sure to update the tap registry when those methods are called.

This increases the performance of Tap#reverse_tap_migrations_renames since most of that time was spent in Tap.each loading the installed taps from the tap directory.

On my local computer, the performance goes from 3+ seconds to around 0.2 when calling Tap.each(&:to_s) in a loop 6k times. 6k is around the number of formulae in the core tap so this is not unheard of.

Before:

==> Interactive Homebrew Shell              
Example commands available with: `brew irb --examples`
brew(main):001:0> require "benchmark"
=> true
brew(main):002:1* Benchmark.realtime do
brew(main):003:1*   6_000.times { Tap.each(&:to_s) }
brew(main):004:0> end
=> 3.4574323009983345
brew(main):005:0> Tap.clear_cache
=> T::Private::Types::Void::VOID
brew(main):006:1* Benchmark.realtime do
brew(main):007:1*   6_000.times { Tap.reverse_tap_migrations_renames }
brew(main):008:0> end
=> 4.388525905000279
brew(main):009:0> exit

After:

==> Interactive Homebrew Shell
Example commands available with: `brew irb --examples`
brew(main):001:0> require "benchmark"
=> true
brew(main):002:1* Benchmark.realtime do
brew(main):003:1*   6_000.times { Tap.each(&:to_s) }
brew(main):004:0> end
=> 0.027163652001036098
brew(main):005:0> Tap.clear_cache
brew(main):006:1* Benchmark.realtime do
brew(main):007:1*   6_000.times { Tap.reverse_tap_migrations_renames }
brew(main):008:0> end
=> 1.0597649259980244

There might be additional improvements that can be made to Tap.reverse_tap_migrations_renames but there will be diminishing returns. We can consider doing that work in a separate PR.

@apainintheneck apainintheneck marked this pull request as ready for review March 3, 2024 23:33
Comment on lines -143 to -146
specify "::names" do
expect(described_class.names.sort).to eq(["homebrew/core", "homebrew/foo"])
end

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This will be deprecated soonish so I figured we might as well remove the test instead of updating it. I guess we weren't automatically including the core cask tap on macOS before in this list.

Library/Homebrew/tap.rb Show resolved Hide resolved
Library/Homebrew/tap.rb Outdated Show resolved Hide resolved
Copy link
Member

@MikeMcQuaid MikeMcQuaid left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Seems fine to me. Feels a little weird to add a new concept ("tap registry") instead of just busting the original cache in these very unusual/never-in-a-hot-loop situations? Not a blocker though if you and @Bo98 both disagree.

@@ -60,7 +60,7 @@ def tap
tap.fix_remote_configuration
end
elsif args.no_named?
puts Tap.select(&:installed?)
puts Tap.select(&:installed?).sort_by(&:to_s)
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Suggested change
puts Tap.select(&:installed?).sort_by(&:to_s)
puts Tap.select(&:installed?).sort_by(&:name)

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

It should be equivalent but I'll change it back for clarity.

return to_enum unless block
sig { returns(Integer) }
def hash
[self.class, name].hash
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Do we actually need the class here? It shouldn't be possible to get a Tap for e.g. homebrew/core anyways.

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

It is considered good practice to include, in case this is ever used in mixed-type collections: https://ruby-doc.org/3.2.2/Object.html#method-i-hash

def self.registry
cache[:registry] ||= Set.new.tap do |set|
if TAP_DIRECTORY.directory?
set.merge TAP_DIRECTORY.subdirs
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Doesn't merge create a new Set?

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

No, it always returns self:

Merges the elements of the given enumerable object to the set and returns self.

@reitermarkus
Copy link
Member

instead of just busting the original cache in these very unusual/never-in-a-hot-loop situations

Yes, given this probably only happens rarely, maybe Tap.clear_cache is enough?

@reitermarkus
Copy link
Member

reitermarkus commented Mar 4, 2024

Maybe the simplest solution would be to call Tap::clear_cache in Tap#clear_cache, i.e. every time a single tap's cache needs to be cleared, the global cache should also be cleared.

@apainintheneck
Copy link
Contributor Author

Maybe the simplest solution would be to call Tap::clear_cache in Tap#clear_cache, i.e. every time a single tap's cache needs to be cleared, the global cache should also be cleared.

I'm not a fan of this unless we wind up putting more stuff in the class level Tap cache that needs to be cleared at the same time as the instance cache. Ideally, we'd find a way to stuff more logic into the instance cache to avoid that needing to add more stuff to the class level cache.

@apainintheneck
Copy link
Contributor Author

Busting the class level Tap cache might be convenient to cover all the bases but it's not strictly necessary right now. I'm fine with that if it's preferable over updating the list of taps manually. Complexity-wise I think they're around the same.

Let me sketch out the alternative implementation from what I gather from the comments.

  1. Cache the list of taps once in Tap.each presumably in cache[:each] at the class level. This could use an array which we make sure is unique by name when caching.
  2. Clear the class level Tap cache in Tap#install and Tap#uninstall after the instance level cache gets cleared (that seems like a decent spot to put it).

I guess the advantage of this approach is that it doesn't require a separate Tap.registry method or Tap#hash to be redefined but it does mean that we will clear the class level cache more frequently. Could this possibly cause any unexpected behavior? We don't normally clear the class level cache outside of tests. I can't think of anything that would go wrong though.

The main difficulty with caching is remembering when to invalidate the cache and both potential implementations have the same weaknesses there.

@apainintheneck

This comment was marked as resolved.

@MikeMcQuaid
Copy link
Member

I guess the advantage of this approach is that it doesn't require a separate Tap.registry method or Tap#hash to be redefined but it does mean that we will clear the class level cache more frequently. Could this possibly cause any unexpected behavior? We don't normally clear the class level cache outside of tests. I can't think of anything that would go wrong though.

The only thing I can see going wrong is reduced performance.

@reitermarkus
Copy link
Member

Ideally, we'd find a way to stuff more logic into the instance cache

I doubt there is a way, otherwise we end up with the instance cache depending on other taps, which makes even less sense.

@apainintheneck apainintheneck marked this pull request as draft March 6, 2024 05:04
This adds a new `Tap.tap_registry` method that will load all
installed and core taps only once instead of doing it every time
on the fly. We only expect this list to change when a tap is
installed or uninstalled in the lifetime of the program. The only
places where that can happen are in the `Tap#install` and
`Tap#uninstall` methods so we make sure to update the tap registry
when those methods are called.

This increase the performance of `Tap#reverse_tap_migrations_renames`
since most of that time was spent in `Tap.each` loading the installed
taps from the tap directory.

On my local computer, the performance goes from 3+ seconds to
around 0.2 when calling `Tap.each(&:to_s)` in a loop 6k times.
6k is around the number of formulae in the core tap so this
is not unheard of.

Other changes:
- Override `Tap#eql?` and `Tap.hash` so we can store them uniquely in sets
- Update tests to check that taps are getting added and removed from the
  registry when they get installed and uninstalled
@apainintheneck
Copy link
Contributor Author

I doubt there is a way, otherwise we end up with the instance cache depending on other taps, which makes even less sense.

I was thinking of it more in the context of speeding up Tap#reverse_tap_migrations_renames by caching more info per tap. I don't think it will work for a list of all taps though.

@reitermarkus
Copy link
Member

speeding up Tap#reverse_tap_migrations_renames by caching more info per tap.

We should probably make tap_migrations itself cache more stuff first, i.e. instead of returning T::Hash[String, String], return T::Hash[String, [Tap, String]] so we don't have to parse it in reverse_tap_migrations_renames.

@apainintheneck
Copy link
Contributor Author

Closing because the alternative PR already got merged in: #16863

@apainintheneck apainintheneck deleted the memoize-installed-tap-loading branch March 15, 2024 08:38
@github-actions github-actions bot added the outdated PR was locked due to age label Apr 15, 2024
@github-actions github-actions bot locked as resolved and limited conversation to collaborators Apr 15, 2024
Sign up for free to subscribe to this conversation on GitHub. Already have an account? Sign in.
Labels
outdated PR was locked due to age
Projects
None yet
Development

Successfully merging this pull request may close these issues.

None yet

4 participants