Skip to content

Conversation

@AdrianSosic
Copy link
Collaborator

@AdrianSosic AdrianSosic commented Oct 29, 2025

Fixes two problems:

  • General surrogate caching bug for cases where multi-model surrogates are used. The issue was that get_surrogate trained an ephemeral (😬) surrogate object in this case, i.e. did not store the trained model as an attribute. Therefore, the cache could never be re-used in a second call. The solution is simple: instead of creating the replicated surrogate on-the-fly from the stored template, we store the replicated surrogate itself.
  • A bug where IndependentGaussianSurrogates would still allow batch recommendation requests when they are wrapped as part of other surrogates (e.g. CompositeSurrogate)

Note: Does not address #568. This will be fixed once #662 is merged and/or the discrete search spaces have been refactored.

@AdrianSosic AdrianSosic self-assigned this Oct 29, 2025
Copilot AI review requested due to automatic review settings October 29, 2025 08:51
@AdrianSosic AdrianSosic added the bug Something isn't working label Oct 29, 2025
@AdrianSosic AdrianSosic requested a review from AVHopp as a code owner October 29, 2025 08:51
Copy link
Contributor

Copilot AI left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Pull Request Overview

This PR fixes a surrogate caching bug where multi-model surrogates (used with objectives requiring multiple models) were not being properly cached. Previously, get_surrogate created ephemeral surrogate objects that weren't stored as attributes, preventing cache reuse on subsequent calls. The fix ensures that replicated surrogates are stored directly instead of recreating them from templates.

Key Changes:

  • Modified surrogate storage to use replicated surrogates directly via converter
  • Simplified get_surrogate to use the stored surrogate instance
  • Added comprehensive test coverage for the caching behavior

Reviewed Changes

Copilot reviewed 3 out of 3 changed files in this pull request and generated no comments.

File Description
baybe/recommenders/pure/bayesian/base.py Applied _autoreplicate converter to _surrogate_model field and simplified get_surrogate to use stored instance directly
tests/test_surrogate.py Added tests for surrogate caching with different objective types and a smoke test for replicated surrogates
CHANGELOG.md Documented the bug fix in the Fixed section

💡 Add Copilot custom instructions for smarter, more guided reviews. Learn how to get started.

@AdrianSosic AdrianSosic force-pushed the fix/surrogate_caching branch from c9bb85c to 594ce02 Compare October 29, 2025 09:42
@AdrianSosic AdrianSosic marked this pull request as draft October 31, 2025 13:01
@AdrianSosic AdrianSosic force-pushed the fix/surrogate_caching branch from 3437d90 to 7ab3ad5 Compare November 3, 2025 14:21
Previously, a temporary object was created on the fly in the multi-model
case, which meant that the cache could not be used.
The RandomForestSurrogate otherwise outsources operations to a
MeanPredictionModel, which does not support batching.
Was mistakenly ignored previously due to suboptimal batching check.
@AdrianSosic AdrianSosic force-pushed the fix/surrogate_caching branch from 0bd9482 to ca18ae6 Compare November 3, 2025 15:15
run_iterations(campaign, n_iterations, batch_size)
try:
run_iterations(campaign, n_iterations, batch_size)
except OptionalImportError:
Copy link
Collaborator Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

@Scienfitz: do we have an immediate nicer solution to this? I could see that we potentially add some is_available classproperty to the surrogates or something that would simplify their collection in tests, but right now that doesn't exist. The previous skip at the top of the file actually never had an effect due to lazy imports.

@AdrianSosic AdrianSosic marked this pull request as ready for review November 4, 2025 13:41
alias="surrogate_model", factory=GaussianProcessSurrogate
alias="surrogate_model",
factory=GaussianProcessSurrogate,
converter=_autoreplicate,
Copy link
Collaborator Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

@AVHopp @Scienfitz
As discussed in the meeting, the PR is pretty much ready in terms of fixing the targeted problem. However, it uncovered that second (related) problem I explained to you, which is best demonstrated in the bandit example, where a surrogate passed by the user is only used as a "template" and is not mutated (i.e. the user won't be able to access fitted parameters on that object). Note that the problem is not introduced by the PR but already existed before, e.g. if you passed a single-output surrogate but were in a multi-output context, where the same replication mechanism would happen.

To "fix" this problem, we should first clearly define what our expectations are in terms of resulting behavior. That is, what do we think a user would expect to happen:

  1. When the provide a 1-D surrogate to a 1-D use case
  2. ... a 1-D surrogate to an N-D use case
  3. ... an N-D surrogate to an N-D use case and
  4. ... an N-D surrogate to a 1-D use case

Also, there is a difference whether you provide an N-D surrogate with explicit dimensions or one that "automagically" broadcasts to N dimensions using replication.

And related to the above, what do we expect people to access the fitted surrogate? Specifically, the bandit example would work even with the current PR code but would require some rather unexpected / inelegant access like

recommender.get_surrogate()   # <-- this is a composite one, even though a regular one was provided
.surrogates[target.name]

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

bug Something isn't working

Projects

None yet

Development

Successfully merging this pull request may close these issues.

2 participants