Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

NP Regression Model w/ LIG Acquisition #2683

Open
wants to merge 12 commits into
base: main
Choose a base branch
from

Conversation

eibarolle
Copy link

Motivation

This pull request adds a Neural Process Regression Model with a Latent Information Gain acquisition function for BoTorch functionality.

Have you read the Contributing Guidelines on pull requests?

Yes, and I've followed all the steps and testing.

Test Plan

I wrote my own unit tests for both the model and acquisition function, and all of them passed. The test files are in the appropriate folder. In addition, I ran the pytests on my files, and all of them succeeded for those files.

Related

I made a repository holding the pushed files at https://github.com/eibarolle/np_regression, and it has the appropriate API documentation.

@facebook-github-bot facebook-github-bot added the CLA Signed Do not delete this pull request or issue due to inactivity. label Jan 18, 2025
@sdaulton
Copy link
Contributor

Hi @eibarolle! Thanks for the PR! I'll review it shortly.

Copy link
Contributor

@sdaulton sdaulton left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Thanks the PR! This is looking good. I left some comments inline

import torch
import torch.nn as nn
import matplotlib.pyplot as plts
# %matplotlib inline
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Suggested change
# %matplotlib inline

Comment on lines 24 to 27
from sklearn.gaussian_process import GaussianProcessRegressor
from sklearn.gaussian_process.kernels import (RBF, Matern, RationalQuadratic,
ExpSineSquared, DotProduct,
ConstantKernel)
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Suggested change
from sklearn.gaussian_process import GaussianProcessRegressor
from sklearn.gaussian_process.kernels import (RBF, Matern, RationalQuadratic,
ExpSineSquared, DotProduct,
ConstantKernel)

Let's avoid the sklearn dependency since it isn't used

ConstantKernel)
from typing import Callable, List, Optional, Tuple
from torch.nn import Module, ModuleDict, ModuleList
from sklearn import preprocessing
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Suggested change
from sklearn import preprocessing

from typing import Callable, List, Optional, Tuple
from torch.nn import Module, ModuleDict, ModuleList
from sklearn import preprocessing
from scipy.stats import multivariate_normal
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Suggested change
from scipy.stats import multivariate_normal

Let's remove the unused imports

from scipy.stats import multivariate_normal
from gpytorch.distributions import MultivariateNormal

device = torch.device("cpu")
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Suggested change
device = torch.device("cpu")

Let's make the code agnostic to the device used.

Comment on lines 294 to 297
if n == 1:
eps = torch.autograd.Variable(logvar.data.new(self.z_dim).normal_()).to(device)
else:
eps = torch.autograd.Variable(logvar.data.new(n,self.z_dim).normal_()).to(device)
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Suggested change
if n == 1:
eps = torch.autograd.Variable(logvar.data.new(self.z_dim).normal_()).to(device)
else:
eps = torch.autograd.Variable(logvar.data.new(n,self.z_dim).normal_()).to(device)
shape = [n, self.z_dim]
if n == 1:
shape = shape[1:]
eps = torch.autograd.Variable(logvar.data.new(*shape).normal_()).to(device)

This is a bit more concise

mu: torch.Tensor,
logvar: torch.Tensor,
n: int = 1,
min_std: float = 0.1,
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This default seems high, no?

Comment on lines 356 to 368
def load_state_dict(
self,
state_dict: dict,
strict: bool = True
) -> None:
"""
Initialize the fully Bayesian model before loading the state dict.

Args:
state_dict (dict): A dictionary containing the parameters.
strict (bool): Case matching strictness.
"""
super().load_state_dict(state_dict, strict=strict)
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Not needed, since it just call's the parent class's method

Comment on lines 450 to 455
ind = np.arange(x.shape[0])
mask = np.random.choice(ind, size=n_context, replace=False)
x_c = torch.from_numpy(x[mask])
y_c = torch.from_numpy(y[mask])
x_t = torch.from_numpy(np.delete(x, mask, axis=0))
y_t = torch.from_numpy(np.delete(y, mask, axis=0))
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Any reason to not do this in pytorch?

import torch
#reference: https://arxiv.org/abs/2106.02770

class LatentInformationGain:
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Can we implement this as a subclass of Acquisition, so that we can use it more organically in botorch? Likely the context would be needed to be provided in LatentInformationGain.__init__

@eibarolle
Copy link
Author

I applied the suggested changes to my new code. Note that for the decoder, the other functions are designed around its current state.

@Balandat
Copy link
Contributor

@hvarfner curious if you have any thoughts on this PR re the information gain aspects

@hvarfner
Copy link
Contributor

Interesting! I'll check out the paper quickly and get back to you


self.acquisition_function.num_samples = 20
lig_2 = self.acquisition_function.forward(candidate_x=self.candidate_x)
self.assertTrue(lig_2.item() < lig_1.item())
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Why would this unit test necessarily be a good check? I am not sure on the details, but it seems to me that this just improves the accuracy of the acquisition computation.

self.context_x = context_x.to(device)
self.context_y = context_y.to(device)

def forward(self, candidate_x):
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

It seems to me like the acquisition function computes the information gain on one batch of points, with the batch being in the first dimension. Thus, the output of this forward would be one scalar.

This would run contrary to the acquisition function convention, and so it wouldn't be able to be used with optimize_acqf etc.

Copy link
Contributor

@hvarfner hvarfner Jan 28, 2025

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

You would want the forward to be able to handled a N x q x D-shaped input, where you are currently only computing the q-element (but that aspect seems correct as far as I can tell right now). This may be a bit challenging, but certainly looks doable!

I think this a pretty cool idea that could be useful generally in latent space models, so it would be nice some fairly general naming for the encoding steps if this were to be used with other encoder-decoder based architectures.

@eibarolle
Copy link
Author

The Latent Information Gain forward function has been updated with the correct dimensions, with the test cases adjusted as needed.

@eibarolle
Copy link
Author

eibarolle commented Feb 10, 2025

Notify me when you can check over the new Latent Information Gain function. @hvarfner

@eibarolle
Copy link
Author

Any updates @hvarfner?

@hvarfner
Copy link
Contributor

Hi @eibarolle,

Thanks for pinging, and sorry for being slow in this. I've taken a good look at it now.

Some issues pointed out by @sdaulton (MAE etc.) are still there or are unaddressed, so it would be nice to have these addressed. Moreover, the code would need to be formatted (see CONTRIBUTING.md).

On my previous point, the acquisition function should return a tensor of shape N, unless the model is an ensemble. Thus, the q-dimension should be reduced over. In this case, it should probably amount to a sum(dim=-1), but that is just my educated guess.

Moreover, It is not obvious how run the code deviates substantially from other BoTorch models (e.g. SingleTaskGP). Thus, I think this PR would have to come with an example (e.g. a tutorial) of the intented usage. Moreover, the code should adhere as much as possible to the standard BoTorch interfaces that are in place, ideally so that one can swap an STGP for an NPR (with mostly all the same arguments otherwise). As far as I can tell, this should be doable. The same goes for the training of the model, which you aptly pointed out in a previous commit here! Specifically, here are the obvious deviations from the convention that should be addressed:

  • The model should take train_X and (optionally) train_Y, but not much more than that
  • The acquisition function should take the model, but not the training data
  • The model should be trained with a BoTorch optimizer (probably fit_gpytorch_mll_torch in your case)

It seems like you have done it differently, where the model takes NN parameters (these should, if anything, be kwargs with sensible defaults, like here). Moreover, you have the acquisition function take the train_X and train_Y, if I am not mistaken. The idea is to have NPR work with other acquisition functions, and to have LIG work with other models as long as they have a latent space with a similar method to call for the latent space predictions.

I recognize that this is a lot of work, but adhering to this standard ultimately ensures the usefulness of your code.

Now, I am not quite sure what the bar is for community contributions, so it would be good to get e.g. @sdaulton 's or @Balandat 's take on this as well.

@eibarolle
Copy link
Author

Got it, thanks for the advice and pointers. I'll make the needed additions/examples, and I'll contact about any questions.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
CLA Signed Do not delete this pull request or issue due to inactivity.
Projects
None yet
Development

Successfully merging this pull request may close these issues.

5 participants