-
Notifications
You must be signed in to change notification settings - Fork 59
Surrogate caching #682
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
base: main
Are you sure you want to change the base?
Surrogate caching #682
Conversation
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Pull Request Overview
This PR fixes a surrogate caching bug where multi-model surrogates (used with objectives requiring multiple models) were not being properly cached. Previously, get_surrogate created ephemeral surrogate objects that weren't stored as attributes, preventing cache reuse on subsequent calls. The fix ensures that replicated surrogates are stored directly instead of recreating them from templates.
Key Changes:
- Modified surrogate storage to use replicated surrogates directly via converter
- Simplified
get_surrogateto use the stored surrogate instance - Added comprehensive test coverage for the caching behavior
Reviewed Changes
Copilot reviewed 3 out of 3 changed files in this pull request and generated no comments.
| File | Description |
|---|---|
| baybe/recommenders/pure/bayesian/base.py | Applied _autoreplicate converter to _surrogate_model field and simplified get_surrogate to use stored instance directly |
| tests/test_surrogate.py | Added tests for surrogate caching with different objective types and a smoke test for replicated surrogates |
| CHANGELOG.md | Documented the bug fix in the Fixed section |
💡 Add Copilot custom instructions for smarter, more guided reviews. Learn how to get started.
c9bb85c to
594ce02
Compare
3437d90 to
7ab3ad5
Compare
Previously, a temporary object was created on the fly in the multi-model case, which meant that the cache could not be used.
The RandomForestSurrogate otherwise outsources operations to a MeanPredictionModel, which does not support batching. Was mistakenly ignored previously due to suboptimal batching check.
0bd9482 to
ca18ae6
Compare
| run_iterations(campaign, n_iterations, batch_size) | ||
| try: | ||
| run_iterations(campaign, n_iterations, batch_size) | ||
| except OptionalImportError: |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
@Scienfitz: do we have an immediate nicer solution to this? I could see that we potentially add some is_available classproperty to the surrogates or something that would simplify their collection in tests, but right now that doesn't exist. The previous skip at the top of the file actually never had an effect due to lazy imports.
| alias="surrogate_model", factory=GaussianProcessSurrogate | ||
| alias="surrogate_model", | ||
| factory=GaussianProcessSurrogate, | ||
| converter=_autoreplicate, |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
@AVHopp @Scienfitz
As discussed in the meeting, the PR is pretty much ready in terms of fixing the targeted problem. However, it uncovered that second (related) problem I explained to you, which is best demonstrated in the bandit example, where a surrogate passed by the user is only used as a "template" and is not mutated (i.e. the user won't be able to access fitted parameters on that object). Note that the problem is not introduced by the PR but already existed before, e.g. if you passed a single-output surrogate but were in a multi-output context, where the same replication mechanism would happen.
To "fix" this problem, we should first clearly define what our expectations are in terms of resulting behavior. That is, what do we think a user would expect to happen:
- When the provide a 1-D surrogate to a 1-D use case
- ... a 1-D surrogate to an N-D use case
- ... an N-D surrogate to an N-D use case and
- ... an N-D surrogate to a 1-D use case
Also, there is a difference whether you provide an N-D surrogate with explicit dimensions or one that "automagically" broadcasts to N dimensions using replication.
And related to the above, what do we expect people to access the fitted surrogate? Specifically, the bandit example would work even with the current PR code but would require some rather unexpected / inelegant access like
recommender.get_surrogate() # <-- this is a composite one, even though a regular one was provided
.surrogates[target.name]
Fixes two problems:
get_surrogatetrained an ephemeral (😬) surrogate object in this case, i.e. did not store the trained model as an attribute. Therefore, the cache could never be re-used in a second call. The solution is simple: instead of creating the replicated surrogate on-the-fly from the stored template, we store the replicated surrogate itself.IndependentGaussianSurrogateswould still allow batch recommendation requests when they are wrapped as part of other surrogates (e.g.CompositeSurrogate)Note: Does not address #568. This will be fixed once #662 is merged and/or the discrete search spaces have been refactored.