Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

MultiOutputGP emulators run in parallel on GPU #156

Open
ots22 opened this issue Feb 18, 2021 · 2 comments
Open

MultiOutputGP emulators run in parallel on GPU #156

ots22 opened this issue Feb 18, 2021 · 2 comments
Labels

Comments

@ots22
Copy link
Member

ots22 commented Feb 18, 2021

Currently runs in serial (through usual MultiOutputGP python class), since can't pickle GaussianProcessGPU for multiprocessing

Options to fix:

  • Run each output GP serially (for the initial version: still expect considerable speedup in many cases) - currently implemented
  • Add pickling/unpickling (see an older version that had this) - done, but see below
  • Handle multiple GPs within library code

One recent multi-emulator example has 60 input points, 2000 prediction batch and 100000 output emulators.

@ots22 ots22 added the gpu label Feb 18, 2021
@ots22 ots22 added this to the Merge feature/gpu milestone Feb 18, 2021
@nbarlowATI
Copy link
Member

nbarlowATI commented Mar 3, 2021

Summary of our process / thought process during a session of working on this:

  • The obstacle that we hit before was pickling/unpickling as required by Multiprocessing.starmap. Using __setstate__ and __getstate__ from the previous version of GaussianProcessGPU, we can now pickle and unpickle.
  • However, we still see an error when we try to use a MultiOutputGP containing GaussianProcessGPU
terminate called after throwing an instance of 'thrust::system::system_error'
what(): device free failed : initialization error
  • In any case we think that using starmap for predict will be problematic, as it will need to refit each emulator after unpickling in the new process.
  • We then consider two alternative possibilities:
    • Running the emulators in serial in MultiOutputGP - possibly useful as a quick first step to get feature parity with CPU version in order to merge the branch
    • Making a C++ MultiOutputGP - this would seem to be the best approach - have a C++ class that mirrors the structure of the Python MultiOutputGP (i.e. owns several DenseGP_GPU objects).
  • We also spent some time investigating whether the destructor of DenseGP_GPU is ever called when the python object is deleted - it didn't seem to be
    • What to do about this? Write a custom method that mimics a destructor? Write a wrapper class that owns a DenseGP_GPU and has a cleanup() method to delete it? Maybe having a C++ MultiOutputGP will solve this for us.

@ots22 ots22 changed the title MultiOutputGP functional on GPU MultiOutputGP emulators run in parallel on GPU Mar 3, 2021
@ots22 ots22 removed this from the Merge feature/gpu milestone Mar 3, 2021
@ots22
Copy link
Member Author

ots22 commented Mar 10, 2021

Some more thoughts:

  • MultiOutputGP provides direct access to individual the emulators (as GaussianProcess objects), and some functionality depends on this (e.g. fitting), which would have to be duplicated otherwise. A MultiOutputGPGPU class should provide the same interface if possible.
  • Adding the ability to construct GaussianProcessGPU objects from DenseGP_GPU objects (via pybind), with minimal construction overhead, would allow the CUDA/C++ code to own the collection of emulators, and return DenseGP_GPU objects to python
  • This means GaussianProcessGPU should not keep its own copy of inputs/targets (we'd need to rethink pickling)
  • It's unclear how to get this to work well with Multiprocessing

Noting @edaub's recent changes in #178

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
Projects
None yet
Development

No branches or pull requests

2 participants