-
-
Notifications
You must be signed in to change notification settings - Fork 50
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Add support for long tails (and other distributions) to prior_from_idata #330
Comments
And here the more general case: import numpy as np
import arviz
from scipy.interpolate import interp1d, InterpolatedUnivariateSpline
from scipy.stats import norm, lognorm
import pytensor.tensor as pt
import pymc as pm
from pymc.distributions.continuous import Interpolated, SplineWrapper
def compute_empirical_cdf(data):
"""Compute the empirical CDF of the given data
"""
sorted_data = np.sort(data)
cdf = np.arange(.5, len(sorted_data)) / len(sorted_data)
# TODO: adds the end points 0 and 1
return sorted_data, cdf
def define_empirical_params_from_trace(trace, names, label):
"""Define prior parameters from existing trace
(must be called from pymc.Model context)
Parameters
----------
trace: pymc.InferenceData from a previous sampling
names: parameter names to be redefined in a correlated manner
label: used to add a "{label}_params" dimension to the model, and name a new "{label}_mv" MvNormal distribution
Returns
-------
list of named tensors defined in the function
Aim
---
Conserve both the marginal distribution and the covariance across parameters
Method
------
The specified parameters are extracted from the trace and resampled using normalization
of the marginal distribution to N(0,1) and a MvNormal distribution for the joint posterior.
The parameters are subsequently transformed back:
- Normal's cdf to transform from N(0,1) to U(0,1)
- the inverse CDF of the interpolated distribution is applied from U(0, 1) to the original (empirical) distribution
"""
# normalize the parameters so that their marginal distribution follows N(0, 1)
params = arviz.extract(trace.posterior[names])
fits = []
for name in names:
# find distribution parameters and normalize the variable
x = params[name].values
# First transform the marginal distribution to a standard normal
sorted_data, sorted_cdf = compute_empirical_cdf(x)
# Interpolate the inverse CDF
cdf = interp1d(sorted_data, sorted_cdf, bounds_error=False, fill_value=(sorted_data[0], sorted_data[-1]))
# Transform the data to a standard normal
x_uniform = cdf(x) # U(0,1) from the empirical distribution
x_normal = norm.ppf(x_uniform) # N(0,1)
# Replace the original value (in-place)
x[:] = x_normal
# Create the inverse CDF function with tensor values (thanks to SplineWrapper)
icdf = SplineWrapper(InterpolatedUnivariateSpline(sorted_cdf, sorted_data, k=1, ext="const"))
fits.append(icdf)
# add a model dimension
model = pm.modelcontext(None)
dim = f'{label}_coparams'
if dim not in model.coords:
model.add_coord(dim, names)
# Define a MvNormal
cov = np.cov([params[name] for name in names], rowvar=True)
chol = np.linalg.cholesky(cov)
coparams = pm.Normal(label+'_iid', dims=dim)
mv = chol@coparams
# Transform back to the empirical distribution
named_tensors = []
for i,(name, fitted) in enumerate(zip(names, fits)):
icdf = fitted
uniform_tensor = pm.math.exp(pm.logcdf(pm.Normal.dist(), mv[i])) # Normal's cdf (IS THERE ANYTHING BETTER THAN THIS?)
tensor = icdf(uniform_tensor)
named_tensors.append(pm.Deterministic(name, tensor))
return named_tensors This seems to work and I believe this can be very useful to make sequential Bayesian constraining a reality in |
The approach looks very interesting. It resonated with the idea to have an additional postprocessing step for the trace before the MV normal distribution is approximated. after the trace has been transformed and flattened indeed, normalization of marginals makes a lot of sense since we are going to calculate Cholesky matrix for the covariance of MV Normal The transform of the posterior as described in this issue can happen here, right after the flattening step. Some additional information may apper to make backward transform. as a pseudocode I imagine it can look like this:
However, it is in sort of conflict with what So far the transform is simple there, but some learnable one might be better suited The code here is very simple and assumes you are passing transform bijectors from If a stateful transform is an option, it could look like this:
Which could initialize some constants to be used in forward and backward
The same transform is reused in backward pass, so you can easily assume the correct state is used to create the prior from posterior. Minor changes are needed to the existing codebase, maybe new transforms that have this feature will be a nice addition. |
Thanks for paying attention and for laying out where what could go. Unfortunately at the moment I don't have the resources (time) to invest in proper implementation (I deleted by mistake my message of yesterday saying that). One more thing, I now performed a real-world experiment and it took such a long time to run that it makes it useless for me (it took much longer than doing all in one go, despite having to do less calculations in total in a multi-step approach in the application I'm working on). I'll run another experiment with more resources for the sake of at least validating the results, but the performance issue is crippling (timeout at 11 hours, with 5h left to go, versus 5h30 in the original one-step appraoch ; that's surprising because when I used the log approximation posted some time ago it took less time, about 3h). I'm not sure whether I'll dedicate time to investigate what's going on, but I wanted to stress the possible performance issues. Here what I envisage is to carefully select which parameter needs which approximation. E.g. use normal distribution by default, parametric distributions for things that look like it (for now I only implemented log-normal as originally posted) because of its proximity with the normal distribution), and empirical distributions as above when there is no good match. |
I could check in my real-world use-case that the approach is working, which is the satisfying part. Unfortunately further checks on better machines still yield a longer sampling time for the sequential sampling than sampling all together. There had been a speed-up (but inaccurate results) when using normal and log-normal approximation, so that means the following lines are responsible for the slow sampling (rewrite of the above code): normal_tensor = mv[i]
uniform_tensor = pm.math.exp(pm.logcdf(pm.Normal.dist(), normal_tensor))
tensor = SplineWrapper(InterpolatedUnivariateSpline(sorted_cdf, sorted_data, k=1, ext="const")))(uniform_tensor) Perhaps the underlying gradient calculation is not great? I wonder if a Kernel Density Estimate (which is smoother) could yield faster results. That would be easier to calculate the inverse cdf from, wouldn't it? @ricardoV94 @ferrine |
The screenshot is from the KDE article on Wikipedia: One way forward would be to use statsmodels (or another tool, scipy, I don't know) to fit the posterior from the trace, with normal Kernel, and then retrieve the individual kernels, and build a custom distribution in pymc which is the sum of a set of Normal distributions. The missing link here is the inverse CDF. It's straighforward for a Normal distribution, but is it for a sum of Normal? The whole point of the approach would be to have something more efficient than the code above with SplineWrapper. |
I would also look into box cox transform and related ones |
Can the Spline be tweaked to have less knots? We can probably just write it directly in PyTensor without a wrapper Op, which should be faster. Same for the kde |
There is already one implemented https://github.com/pymc-devs/pymc-experimental/blob/main/pymc_experimental/utils/spline.py |
Hi, I wrote a small function that does a job similar to
prior_from_idata
, but can also handle log-normal distributions. @ricardoV94 suggested it could find its place inpymc-experimental
, or at least contribute to the discussion.As I discuss here (and copula references therein), the function could be extended to support any distribution, by first transforming the trace posterior to a uniform distribution, and then from uniform to normal. The approach could be applied whenever
pymc
implements the inverse CDF method. However, I don't know how that could (negatively) affect convergence. In particular, ifInterpolated
would implement the inverse CDF method (it does not currently, but that is probably not a big deal to implement ?), the approach could handle any distribution empirically, while taking into account the correlations across parameters (see copula contributions (here and there)).The text was updated successfully, but these errors were encountered: