Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Timeseries can be tricked by a couple bad first inputs #283

Open
Lnaden opened this issue Jan 22, 2018 · 15 comments
Open

Timeseries can be tricked by a couple bad first inputs #283

Lnaden opened this issue Jan 22, 2018 · 15 comments

Comments

@Lnaden
Copy link
Contributor

Lnaden commented Jan 22, 2018

If you have one or two wildly off datum points for timeseries.detectEquilibration from the start of the timeseries, then the true equilibration time is woefully underpredicted. @jchodera requested I post sample data and script to process and regenerate the plot and output from below.

The output below is from Top plot is the full timeseries with bars showing where "Equilibration" stops, bottom is the same set from the 2nd sample onwards (series[1:]). Below the image is the code used to create. The dataset is in the attached zip file. @jchodera If you only want the series to look at, you only need the imports and the first three lines of the code below.

Full Trajectory               -- Equilibration 0,  Subsample Rate 1.75, Num Effective 1710
Trajectory w/o initial sample -- Equilibration 20, Subsample Rate 3.75, Num Effective 794

import numpy as np
import matplotlib.pyplot as plt
from pymbar import timeseries

y = np.load('explicit0_timeseries.npy')
[n_equilibration, g_t, n_effective_max] = timeseries.detectEquilibration(y)
[n_short, g_t_short, n_eff_max_short] = timeseries.detectEquilibration(y[1:])

print("Full Trajectory -- Equilibration {0:d}, Subsample Rate {1:3.2f}, Num Effective {2:d}".format(
    n_equilibration, g_t, int(np.floor(n_effective_max))
))
print("Trajectory w/o initial sample -- Equilibration {0:d}, Subsample Rate {1:3.2f}, Num Effective {2:d}".format(
    n_short, g_t_short, int(np.floor(n_eff_max_short))
))

f, (a,b) = plt.subplots(2, 1)
x = np.arange(y.size)
a.plot(x, y, 'k-', label='Timeseries')
b.plot(x[1:], y[1:], '-k')
for p in [a, b]:
    ylim = p.get_ylim()
    xlim = p.get_xlim()
    p.set_xlabel('Iteration')
    p.set_ylabel(r'$\sum_k u_{k,k,n}$')
    p.vlines(n_equilibration, *ylim,
             colors='b', linewidth=1,
             label='Full Timeseries: Num Samples={}'.format(int(np.floor(n_effective_max))))
    p.vlines(n_short, *ylim,
             colors='r', linewidth=1,
             label='Timeseries[1:]: Num Samples={}'.format(int(np.floor(n_eff_max_short))))
    p.set_ylim(ylim)
    p.set_xlim(xlim)
a.legend()
f.savefig('bad_series.png', bbox_inches='tight')

explicit0_timeseries.npy.zip

@jchodera
Copy link
Member

Thanks! Looking into this now!

@jchodera
Copy link
Member

The issue seems to be that the estimate of the statistical inefficiency g plunges if the initial highly-atypical configuration is included:
image
This, in turn, causes the estimated number of effective snapshots to spike.

@jchodera
Copy link
Member

If I change this line to use a more robust estimator of cov(A,B) by changing

sigma2_AB = (dA_n * dB_n).mean()  # standard estimator to ensure C(0) = 1

to

sigma2_AB = np.median(dA_n * dB_n)  # standard estimator to ensure C(0) = 1

the desired behavior is restored:
image

@jchodera
Copy link
Member

The number of uncorrelated samples seems to be underestimated by this change, but it gets the maximum in the desired place. I can then recompute the true g afterwards using the normal scheme.

@jchodera
Copy link
Member

Another possibility is to only compute the following statistics over the last half of the timeseries:

    # Compute mean of each timeseries.
    half = int(len(A_n) / 2)
    mu_A = np.mean(A_n[half:])
    mu_B = np.mean(B_n[half:])
...
    # Compute estimator of covariance of (A,B) using estimator that will ensure C(0) = 1.
    sigma2_AB = np.mean(dA_n[half:] * dB_n[half:])  # standard estimator to ensure C(0) = 1

image

@jchodera
Copy link
Member

I'll try these alterations in my original test set and see which choice minimally disrupts the desired optimal performance of the method.

@ocmadin
Copy link

ocmadin commented Feb 8, 2018

Interested in the resolution of this issue as it is affecting me too. I can provide another example if that would be helpful!

@jchodera
Copy link
Member

jchodera commented Feb 8, 2018

Interested in the resolution of this issue as it is affecting me too. I can provide another example if that would be helpful!

Please do! The more, the better!

@ocmadin
Copy link

ocmadin commented Feb 8, 2018

Okay, attached are a test script and data. This timeseries is from a simulation with data recorded every 500 steps, hence the length. Even though the first couple values of are much lower than the rest, detectEquilibration still decides that first index in the equilibrium region is 0.

from pymbar import timeseries
import numpy as np
import matplotlib.pyplot as plt

data_om=np.load('energy_timeseries_om.npy')
#Simulation step numbers and data recorded every 500 steps
stepnumber=data_om[0,:]
energy_timeseries=data_om[1,:]
[t_equil,g,N_eff]=timeseries.detectEquilibration(energy_timeseries)
print("Equilibrated (according to function) at index %i" % t_equil)
plt.plot(stepnumber,energy_timeseries)
plt.title("Timeseries value vs. Step Number")

image

energy_timeseries_om.zip

@jchodera
Copy link
Member

jchodera commented Feb 8, 2018

Thanks! This is super helpful @ocmadin!

@mrshirts
Copy link
Collaborator

Seems like this is also a duplicate of #277

@ocmadin
Copy link

ocmadin commented Jul 13, 2018

@jchodera just wanted to bring this up again, as I have been getting outliers in heat capacity calculations from fluctuations, and this turned out to be the culprit. I have a temporary caveman solution, but would be interested in a more permanent (and elegant) solution.

@mrshirts
Copy link
Collaborator

Can you post a sample data set that is causing problems?

@ocmadin
Copy link

ocmadin commented Jul 13, 2018

Okay, so here is a comparison of two datasets, one where timeseries.detectEquilibration works and one where it fails:

These are both 9 ns simulations of 200 cyclohexane molecules at 293.15 K and 1.01 bar.

In the first (cychex_timeseries_good_equil.npy), detectEquilibration works properly,

image

While in the second (cychex_timeseries_bad_equil.npy) , it does not:

image

Interestingly, these two timeseries start at roughly equal volumes (0.02302 for good vs 0.02304 for bad), but within the first 10 timesteps, the "bad" timeseries hits a max value of 0.02341, where the "good" timeseries only hits a max value of 0.02322:

image

It seems like this little spike is the difference between detectEquilibration working properly and failing.

cychex_timeseries_data.zip

@fjclark
Copy link

fjclark commented Oct 23, 2024

I've had a look at these with some of the equilibration heuristics we tested here. Specifically:

  • original_pymbar: equivalent to the standard PyMBAR detect_equilibration with fast=False.
  • pymbar_min_sse: uses the same method as original_pymbar to account for autocorrelation, but selects the equilibration time as the minimum of the marginal standard error (squared standard error (SSE)), rather than the maximum effective sample size.
  • window_root_n: also determines equilibration according to the minimum marginal standard error, but accounts for autocorrelation using a window of size $\sqrt N$, where $N$ is the number of samples remaining for the truncation time being evaluated. This method seemed to be the most robust of those we tested.
  • mser: the original marginal standard error rule of White. This is equivalent to window_root_n with a window size of 1/ calculating the standard error completely ignoring autocorrelation.

Naden Issue

As discussed in this issue, selecting the truncation point by minimum marginal standard error, rather than maximum effective sample size increases sensitivity to initial bias and means that all methods other than original_pymbar remove the highly biased samples (truncation indices are shown in the legend):

image

pymbar_min_sse throws away a few more than might seem reasonable, but this is still a small fraction of the data:
image

Madin First Issue

Here, both the original_pymbar and pymbar_min_sse methods fail to discard the highly biased samples, while the other methods produce reasonable results:

image

Here, the window_root_n method throws away a few more samples than looks reasonable, but again this is a small fraction of the overall data:

image

To understand why using the minimum marginal standard error (or SSE - squared standard error) in the pymbar_min_sse method didn't fix the issue, I've plotted the the ESS or 1/SSE for the first few indices for all methods (normalising so that the first value is 1):

image

The ESS (with original PyMBAR autocorrelation treatment) shows a large spike at index 0. ESS is given by ESS = n_actual * Var_uncor / Var_cor, where "n_actual" is the actual number of samples, and "uncor" and "cor" denote the uncorrelated and correlated variance estimates. Plotting these two variance estimates (normalised so that the first term is 1) shows that the first few very biased samples dramatically increase Var_uncor, but decrease Var_cor (by shifting the mean closer to a more central value for the later data):

image

Therefore, using 1/SSE = n_actual / Var_cor dramatically reduces the initial spike (pymbar_min_sse). However, it doesn't totally fix the initial spike, because the first correlated variance estimate is still low. As we go to methods which less thoroughly account for autocorrelation ( pymbar_min_sse -> window_root_n -> mser), we go from using 1/SSE = n_actual / Var_cor to 1/SSE = n_actual / Var_uncor, which gradually removes the initial spike in 1/SSE, moving the equilibration time later.

Madin Second Issue (Bad Cyclohexane Series)

Here, I find that original_pymbar selects a truncation index of 2, rather than 0 as in the issue. I've checked with detectEquilibration(fast=False) from PyMBAR version 3.1.1 and also get 2. Here, the first ~ 10 points are particularly biased, and all of the methods other than original_pymbar discard these:

image

Summary

  • Selecting the truncation point by minimum marginal standard error, rather than maximum effective sample size increases sensitivity to the initial transient.
  • The window_root_n and mser methods seem to consistently remove the problematic samples. However, we'd generally recommend against MSER since it's prone to under-discarding when a trend in the average is hidden by noise (see here).

Code

All code and output: pymbar_equil_issues.tar.gz (you'll need to install red with mamba install -c conda-forge red-molsim).

Thanks!

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

No branches or pull requests

5 participants