Skip to content

Commit 58f6c2e

Browse files
authored
V0.2.0rc (#64)
* Add dimension reduction functionality (python impl. of gKDR) * Tweaks to dimension reduction test * Add documentation for DimensionReduction * Add/fix some tests for DimensionReduction * Fix test (DimensionReduction) * Docstring -> raw string to fix invalid escape sequence warning * Add sanity checks for parameters passed to DimensionReduction * Fix assertion in DimensionReduction constructor * Fix assertion in DimensionReduction constructor * Test: optimize structural dimension * First attempt at tuning the structural dimension (kdr) * Make GaussianProcess objects callable (for prediction) * Add 'train_model' interface to GaussianProcess * DimensionReduction tweak to example * Introduce X_scale/Y_scale parameters in gKDR kernel (DimensionReduction) * Fix whitespace * DimensionReduction: use specialized Hermitian eigensolver * Tune parameters (structural dimension and kernel lengthscales) within gKDR - rename tune_structural_dimension -> tune_parameters - documentation - improved optimization routine * Improved tests for parameter tuning (gKDR) * Tweak to test (DimensionReduction) * Whitespace cleanup * Correct naming of variable (DimensionReduction) * Factor out internal loss function from gKDR.tune_parameters * Use a smaller test example to reduce test runtime * MCMC (#33) * added separate functions to calculated squared exponential kernel * added matern 5/2 covariance function * put kernel computations into a separate function and removed conjugate gradient based unit test that always gave problems * moved kernel functions and tests to separate files * added function to compute gradient of the squared exponential kernel * changed GP class to use derivative function * added derivatives for matern 5/2 kernel * quick and dirty modification to GP in order to use kernel functions * cleaned up distance calculation to use standardized euclidean distance * modified fast GP in MICE code to use kernel interface * made correction to meaning of nugget parameter for MICE candidate GP to be relative to current variance * fixed minor issues in MICE design to allow for zero samples and ensuring that parameter values are correctly set * updated MICE benchmark details * cosmetic tweaks to MICE benchmark * full hessian implementation in kernel functions * refactored kernel functions into objects * implemented Hessian computation into GP class * Documented base kernel class * Documented derived kernel classes * added documentation pages for kernels * corrected documentation to include newly implemented classes and fixed some old bugs * renamed run_init_design to be consistent with other methods that use *_initial_design * made minor change in MICEFastGP documentation * broke up prediction methods into single and multiple parameter sets, plus some other changes needed to accomodate them * added routine to compute local covariance matrix from hessian * implemented approximate normal hyperparameter sampling * added utility functions for MCMC sampling * fully implemented basic MCMC sampler * working MCMC implementation with full set of tests * fixed a few bugs in GP and MCMC implmentation * fixed bug in variance prediction where roundoff error can cause negative variance * added docstrings for MCMC routines * added documentation for MCMC-related methods and code additions * created benchmark for MCMC sampling and added documentation pages for it * added information on MCMC benchmark to readme * added additional pages to documentation for MCMC sampling * removed renamed mcmc benchmark file * fixed MCMC docstring in GP class * Fix whitespace in Makefile * Forward kwargs (gKDR._compute_loss); correct number of cross-validation folds * Add benchmark for gKDR * Wrap long lines in docstrings * Versioning (#38) * added code needed for versioning to devel branch * forgot to modify setup.py file * corrected line accidentally deleted from __init__.py * added prerelease number to devel branch to track commits on devel * corrected comments in conf.py to reflect full release numbering * added simple demos for GP and MICE (#46) * added simple demos for GP and MICE * incremented prerelease number for merge * History Matching (#39) * initial commit of history matching class and benchmark with minor tweaks * broke benchmark and sanity checks into two files for history matching * reindented code to use 4 spaces * added unit tests and some bug fixes for HistoryMatching * added tests for implausability plus some other checks and bug fixes in HistoryMatching * fixed misspelling of implausibility * changed file name for benchmark in makefile * fixed documentation in HistoryMatching class to be consistent with others * improved documentation, cleaned up code, added a few unit tests for HistoryMatching * fixed some docstring formatting and base rst file for HistoryMatching * full implementation of history matching with unit tests and documentation * simplified model discrepancy based on discussion with Danny * fixes to history matching file and tests * fixed some comparisons with None in SequentialDesign * broke up long test for Hessian into parts * incremented prerelease for history match merge * Feature/mucmtoolkit (#54) * added toolkit with converted pages and images * incremented version number * Feature/mucmtoolkit (#55) * fixed bug in documentation to display methods * version number change for corrected PR * merge input derivative bugfix into devel (#61) * Fix/cachefactmat (#62) * corrected GP class to cache factorized matrix rather than inverse plus cleaned up a few unneeded internal variables * incremented prerelease version number * missed a line that should have been deleted * added test to confirm that variance predictions are stable * fixed solve routines to use cho_solve in scipy * Fix/toolkitcorr (#63) * toolkit proofreading and corrections * continuing updates of toolkit pages * edits to toolkit pages * finished corrections up through meta section * updated toolkit threads section * updates to proc section of toolkit * incremented prerelease version number * modified version for release v0.2.0 * Adjust the paper references in DimensionReduction.py * Update paper reference in documentation
1 parent 9869534 commit 58f6c2e

265 files changed

Lines changed: 30368 additions & 642 deletions

File tree

Some content is hidden

Large Commits have some content hidden by default. Use the searchbox below for content that may be hidden.

.gitignore

Lines changed: 3 additions & 1 deletion
Original file line numberDiff line numberDiff line change
@@ -1,4 +1,6 @@
11
*.pyc
22
*.DS_Store
3-
*.png
3+
docs/_build/*
4+
mogp_emulator/tests/*.png
5+
mogp_emulator/tests/*.pdf
46
mogp_emulator/version.py

README.md

Lines changed: 8 additions & 0 deletions
Original file line numberDiff line numberDiff line change
@@ -128,13 +128,21 @@ using 1, 2, 4, and 8 processess and notes the time required to perform the fitti
128128
will depend on the number of cores on the computer -- once you exceed the number of cores, the performance
129129
will degrade. As with the other benchmarks, Matplotlib can optionally be used to plot the results.
130130

131+
##### MCMC Benchmark
132+
133+
A benchmark applying the software to fitting an emulator with MCMC sampling is included. The code
134+
draws hyperparameter samples and compares the resulting posterior distributions with the values
135+
found via maximum likelihood estimation. If Matplotlib is installed, a histogram of the parameter
136+
samples is shown.
137+
131138
##### MICE Benchmark
132139

133140
A benchmark comparing the MICE Sequential design method to Latin Hypercube sampling is also available.
134141
This creates designs of a variety of sizes and computes the error on unseen data for the 2D Branin
135142
function. It compares the accuracy of the sequential design to the Latin Hypercube for both the
136143
predictions and uncertainties.
137144

145+
138146
### Documentation
139147

140148
Building the documentation requires Sphinx/autodoc, which can be installed using `pip`. To build the documentatation, first install Sphinx and change to the `docs` directory. There is a Makefile in the

docs/DimensionReduction.rst

Lines changed: 33 additions & 0 deletions
Original file line numberDiff line numberDiff line change
@@ -0,0 +1,33 @@
1+
.. _DimensionReduction:
2+
3+
*********************************
4+
The ``DimensionReduction`` module
5+
*********************************
6+
7+
.. automodule:: mogp_emulator.DimensionReduction
8+
9+
10+
---------------------------
11+
Dimension Reduction Classes
12+
---------------------------
13+
14+
.. autoclass:: mogp_emulator.gKDR
15+
16+
.. automethod:: __init__
17+
.. automethod:: __call__
18+
.. automethod:: tune_parameters
19+
20+
---------
21+
Utilities
22+
---------
23+
24+
.. automethod:: mogp_emulator.DimensionReduction.gram_matrix
25+
26+
.. automethod:: mogp_emulator.DimensionReduction.gram_matrix_sqexp
27+
28+
.. automethod:: mogp_emulator.DimensionReduction.median_dist
29+
30+
.. rubric:: References
31+
.. [LG17] Liu, Xiaoyu, and Serge Guillas. "Dimension reduction for Gaussian process emulation: An application to the influence of bathymetry on tsunami heights." SIAM/ASA Journal on Uncertainty Quantification 5.1 (2017): 787-812. https://epubs.siam.org/doi/10.1137/16M1090648
32+
.. [Fukumizu1] https://www.ism.ac.jp/~fukumizu/software.html
33+
.. [FL13] Fukumizu, Kenji and Chenlei Leng. "Gradient-based kernel dimension reduction for regression." Journal of the American Statistical Association 109, no. 505 (2014): 359-370

docs/HistoryMatching.rst

Lines changed: 13 additions & 0 deletions
Original file line numberDiff line numberDiff line change
@@ -0,0 +1,13 @@
1+
.. _HistoryMatching:
2+
3+
**********************************
4+
The ``HistoryMatching`` Class
5+
**********************************
6+
7+
.. automodule:: mogp_emulator.HistoryMatching
8+
:noindex:
9+
10+
.. autoclass:: mogp_emulator.HistoryMatching
11+
:members:
12+
13+
.. automethod:: __init__

docs/MCMC.rst

Lines changed: 8 additions & 0 deletions
Original file line numberDiff line numberDiff line change
@@ -0,0 +1,8 @@
1+
.. _MCMC:
2+
3+
**********************************
4+
The ``MCMC`` Module
5+
**********************************
6+
7+
.. automodule:: mogp_emulator.MCMC
8+
:members:

docs/benchmarks/benchmarks.rst

Lines changed: 1 addition & 0 deletions
Original file line numberDiff line numberDiff line change
@@ -8,3 +8,4 @@ GP Emulator Benchmarks
88
rosenbrock
99
branin
1010
tsunami
11+
mcmc_benchmark

docs/benchmarks/mcmc_benchmark.rst

Lines changed: 7 additions & 0 deletions
Original file line numberDiff line numberDiff line change
@@ -0,0 +1,7 @@
1+
.. _mcmc_benchmark:
2+
3+
**********************************
4+
MCMC Benchmark
5+
**********************************
6+
7+
.. automodule:: mogp_emulator.tests.benchmark_MCMC

docs/conf.py

Lines changed: 1 addition & 1 deletion
Original file line numberDiff line numberDiff line change
@@ -26,7 +26,7 @@
2626
# get version from package
2727
import mogp_emulator
2828
import re
29-
# The full version X.Y.Z
29+
# The full version X.Y.Z with development version if needed
3030
release = mogp_emulator.__version__
3131
# The short verion X.Y
3232
version = re.sub(r"(\d+\.\d+)", r"\1", mogp_emulator.__version__)

docs/index.rst

Lines changed: 9 additions & 0 deletions
Original file line numberDiff line numberDiff line change
@@ -11,12 +11,21 @@ Welcome to Multi-Output GP Emulator's documentation!
1111
:caption: Contents:
1212

1313
GaussianProcess
14+
DimensionReduction
1415
MultiOutputGP
1516
Kernel
1617
ExperimentalDesign
1718
SequentialDesign
19+
HistoryMatching
20+
MCMC
1821
benchmarks/benchmarks
1922

23+
.. toctree::
24+
:maxdepth: 1
25+
:caption: Uncertainty Quantification Methods
26+
27+
methods/methods
28+
2029

2130

2231
Indices and tables

docs/methods/alt/AltBLPriors.rst

Lines changed: 142 additions & 0 deletions
Original file line numberDiff line numberDiff line change
@@ -0,0 +1,142 @@
1+
.. _AltBLPriors:
2+
3+
Alternatives: Prior specification for BL hyperparameters
4+
========================================================
5+
6+
Overview
7+
--------
8+
9+
In the fully :ref:`Bayes linear<DefBayesLinear>` approach to
10+
emulating a complex :ref:`simulator<DefSimulator>`, the
11+
:ref:`emulator<DefEmulator>` is formulated to represent prior
12+
knowledge of the simulator in terms of a :ref:`second-order belief
13+
specification<DefSecondOrderSpec>`. The BL prior specification
14+
requires the specification of beliefs about some
15+
:ref:`hyperparameters<DefHyperparameter>`, as discussed in the
16+
alternatives page on emulator prior mean function
17+
(:ref:`AltMeanFunction<AltMeanFunction>`), the discussion page on the
18+
GP covariance function
19+
(:ref:`DiscCovarianceFunction<DiscCovarianceFunction>`) and the
20+
alternatives page on emulator prior correlation function
21+
(:ref:`AltCorrelationFunction<AltCorrelationFunction>`).
22+
Specifically, in the :ref:`core problem<DiscCore>` that is the
23+
subject of the core threads (:ref:`ThreadCoreBL<ThreadCoreBL>`,
24+
:ref:`ThreadCoreGP<ThreadCoreGP>`) a vector :math:`\beta` defines the
25+
detailed form of the mean function, a scalar :math:`\sigma^2` quantifies
26+
the uncertainty or variability of the simulator around the prior mean
27+
function, while :math:`\delta` is a vector of hyperparameters defining
28+
details of the correlation function. Threads that deal with variations
29+
on the basic core problem may introduce further hyperparameters.
30+
31+
A Bayes linear analysis requires hyperparameters to be given prior
32+
expectations, variances and covariances. We consider here ways to
33+
specify these prior beliefs for the hyperparameters of the core problem.
34+
Prior specifications for other hyperparameters are addressed in the
35+
relevant variant thread. Hyperparameters may be handled differently in
36+
the fully :ref:`Bayesian<DefBayesian>` approach - see
37+
:ref:`ThreadCoreGP<ThreadCoreGP>`.
38+
39+
Choosing the Alternatives
40+
-------------------------
41+
42+
The prior beliefs should be chosen to represent whatever prior knowledge
43+
the analyst has about the hyperparameters. However, the prior
44+
distributions will be updated with the information from a set of
45+
training runs, and if there is substantial information in the training
46+
data about one or more of the hyperparameters then the prior information
47+
about those hyperparameters may be irrelevant.
48+
49+
In general, a Bayes linear specification requires statements of
50+
second-order beliefs for all uncertain quantities. In the current
51+
version of this Toolkit, the Bayes linear emulation approach does not
52+
consider the situation where :math:`\sigma^2` and :math:`\delta` are
53+
uncertain, and so we require the following:
54+
55+
- :math:`\text{E}[\beta_i]`, :math:`\text{Var}[\beta_i]`,
56+
:math:`\text{Cov}[\beta_i,\beta_j]` - expectations, variances and
57+
covariances for each coefficient :math:`\beta_i`, and covariances
58+
between every pair of coefficients :math:`(\beta_i,\beta_j), i\neq j`
59+
- :math:`\sigma^2=\text{Var}[w(x)]` - the variance of the residual
60+
stochastic process
61+
- :math:`\delta` - a value for the hyperparameters of the correlation
62+
function
63+
64+
The Nature of the Alternatives
65+
------------------------------
66+
67+
Priors for :math:`\beta`
68+
~~~~~~~~~~~~~~~~~~~~~~~~~
69+
70+
Given a specified form for the basis functions :math:`h(x)` of :math:`m(x)` as
71+
described in the alternatives page on basis functions for the emulator
72+
mean (:ref:`AltBasisFunctions<AltBasisFunctions>`), we must specify
73+
expectation and variance for each coefficient :math:`\beta_i` and a
74+
covariance between every pair :math:`(\beta_i,\beta_j)`.
75+
76+
As with the basis functions :math:`h(x)`, there are two primary means of
77+
obtaining a belief specification for :math:`\beta`.
78+
79+
#. **Expert-led specification** - the specification can be made directly
80+
by an expert using methods such as
81+
82+
a. Intuitive understanding of the magnitude and impact of the
83+
physical effects represented by :math:`h(x)` leading to a direct
84+
quantification of expectations, variances and covariances.
85+
b. Assessing the difference between the model under study and another
86+
well-understood model such as a fast approximate version or an
87+
earlier version of the same simulator. In this approach, we can
88+
combine the known information about the mean behaviour of the
89+
second simulator with the belief statements about the differences
90+
between the two simulator to construct an appropriate belief
91+
specification for the hyperparameters -- see :ref:`multilevel
92+
emulation<DefMultilevelEmulation>`.
93+
94+
#. **Data-driven specification** - when prior beliefs are weak and we
95+
have ample model evaluations, then prior values for :math:`\beta` are
96+
typically not required and we can replace adjusted values for
97+
:math:`\beta` with empirical estimates, :math:`\hat{\beta}`, obtained by
98+
fitting the linear regression :math:`f(x)=h(x)^T\beta`. Our uncertainty
99+
statements about :math:`\beta` can then be deduced from the "estimation
100+
error" associated with :math:`\hat{\beta}`.
101+
102+
Priors for :math:`\sigma^2`
103+
~~~~~~~~~~~~~~~~~~~~~~~~~~~~
104+
105+
The current version of the Toolkit requires a point value for the
106+
variance about the emulator mean, :math:`\sigma^2`. This corresponds
107+
directly to making a specification about :math:`\text{Var}[w(x)]`. As with
108+
the model coefficients above, there are two possible approaches to
109+
making such a quantification. An expert could make the specification by
110+
directly quantifying the magnitude of :math:`\sigma^2`. Alternatively, an
111+
expert assessment of the expected prior adequacy of the mean function at
112+
representing the variation in the simulator outputs can be combined with
113+
information on the variation of the simulator output, which allows for
114+
the deduction of a value of :math:`\sigma^2`. In the case of a data-driven
115+
assessment, the estimate for the residual variance :math:`\hat{\sigma}^2`
116+
can be used.
117+
118+
In subsequent versions of the toolkit, Bayes linear methods will be
119+
developed for :ref:`learning<DefBLVarianceLearning>` about
120+
:math:`\sigma^2` in the emulation process. This will require making prior
121+
specifications about the squared emulator residuals.
122+
123+
Priors for :math:`\delta`
124+
~~~~~~~~~~~~~~~~~~~~~~~~~~
125+
126+
Specification of correlation function hyperparameters is a more
127+
challenging task. Direct elicitation can be difficult as the
128+
hyperparameter :math:`\delta` is hard to conceptualise - the alternatives
129+
page on prior distributions for GP hyperparameters
130+
(:ref:`AltGPPriors<AltGPPriors>`) provides some discussion on this
131+
topic, with particular application to the Gaussian correlation function.
132+
Alternatively, when given a large collection of simulator runs then
133+
:math:`\delta` can be crudely estimated using methods such as
134+
:ref:`variogram<ProcVariogram>` fitting on the empirical residuals.
135+
136+
Assessing and updating uncertainties about :math:`\delta` raises both
137+
conceptual and technical problems as methods which would be optimal for
138+
assessing such parameters given realisations drawn from a corresponding
139+
stochastic process may prove to be highly non-robust when applied to
140+
functional computer output which is only represented very approximately
141+
by such a process. Methods for approaching this problem will appear in a
142+
subsequent version of the toolkit.

0 commit comments

Comments
 (0)