Timeseries can be tricked by a couple bad first inputs #283

Lnaden · 2018-01-22T18:54:25Z

If you have one or two wildly off datum points for timeseries.detectEquilibration from the start of the timeseries, then the true equilibration time is woefully underpredicted. @jchodera requested I post sample data and script to process and regenerate the plot and output from below.

The output below is from Top plot is the full timeseries with bars showing where "Equilibration" stops, bottom is the same set from the 2nd sample onwards (series[1:]). Below the image is the code used to create. The dataset is in the attached zip file. @jchodera If you only want the series to look at, you only need the imports and the first three lines of the code below.

Full Trajectory               -- Equilibration 0,  Subsample Rate 1.75, Num Effective 1710
Trajectory w/o initial sample -- Equilibration 20, Subsample Rate 3.75, Num Effective 794

import numpy as np
import matplotlib.pyplot as plt
from pymbar import timeseries

y = np.load('explicit0_timeseries.npy')
[n_equilibration, g_t, n_effective_max] = timeseries.detectEquilibration(y)
[n_short, g_t_short, n_eff_max_short] = timeseries.detectEquilibration(y[1:])

print("Full Trajectory -- Equilibration {0:d}, Subsample Rate {1:3.2f}, Num Effective {2:d}".format(
    n_equilibration, g_t, int(np.floor(n_effective_max))
))
print("Trajectory w/o initial sample -- Equilibration {0:d}, Subsample Rate {1:3.2f}, Num Effective {2:d}".format(
    n_short, g_t_short, int(np.floor(n_eff_max_short))
))

f, (a,b) = plt.subplots(2, 1)
x = np.arange(y.size)
a.plot(x, y, 'k-', label='Timeseries')
b.plot(x[1:], y[1:], '-k')
for p in [a, b]:
    ylim = p.get_ylim()
    xlim = p.get_xlim()
    p.set_xlabel('Iteration')
    p.set_ylabel(r'$\sum_k u_{k,k,n}$')
    p.vlines(n_equilibration, *ylim,
             colors='b', linewidth=1,
             label='Full Timeseries: Num Samples={}'.format(int(np.floor(n_effective_max))))
    p.vlines(n_short, *ylim,
             colors='r', linewidth=1,
             label='Timeseries[1:]: Num Samples={}'.format(int(np.floor(n_eff_max_short))))
    p.set_ylim(ylim)
    p.set_xlim(xlim)
a.legend()
f.savefig('bad_series.png', bbox_inches='tight')

explicit0_timeseries.npy.zip

The text was updated successfully, but these errors were encountered:

jchodera · 2018-01-22T19:16:49Z

Thanks! Looking into this now!

jchodera · 2018-01-22T19:26:59Z

The issue seems to be that the estimate of the statistical inefficiency g plunges if the initial highly-atypical configuration is included:

This, in turn, causes the estimated number of effective snapshots to spike.

jchodera · 2018-01-22T19:32:08Z

If I change this line to use a more robust estimator of cov(A,B) by changing

sigma2_AB = (dA_n * dB_n).mean()  # standard estimator to ensure C(0) = 1

to

sigma2_AB = np.median(dA_n * dB_n)  # standard estimator to ensure C(0) = 1

the desired behavior is restored:

jchodera · 2018-01-22T19:33:10Z

The number of uncorrelated samples seems to be underestimated by this change, but it gets the maximum in the desired place. I can then recompute the true g afterwards using the normal scheme.

jchodera · 2018-01-22T19:36:58Z

Another possibility is to only compute the following statistics over the last half of the timeseries:

    # Compute mean of each timeseries.
    half = int(len(A_n) / 2)
    mu_A = np.mean(A_n[half:])
    mu_B = np.mean(B_n[half:])
...
    # Compute estimator of covariance of (A,B) using estimator that will ensure C(0) = 1.
    sigma2_AB = np.mean(dA_n[half:] * dB_n[half:])  # standard estimator to ensure C(0) = 1

jchodera · 2018-01-22T19:37:51Z

I'll try these alterations in my original test set and see which choice minimally disrupts the desired optimal performance of the method.

ocmadin · 2018-02-08T00:20:41Z

Interested in the resolution of this issue as it is affecting me too. I can provide another example if that would be helpful!

jchodera · 2018-02-08T00:58:23Z

Interested in the resolution of this issue as it is affecting me too. I can provide another example if that would be helpful!

Please do! The more, the better!

ocmadin · 2018-02-08T02:02:24Z

Okay, attached are a test script and data. This timeseries is from a simulation with data recorded every 500 steps, hence the length. Even though the first couple values of are much lower than the rest, detectEquilibration still decides that first index in the equilibrium region is 0.

from pymbar import timeseries
import numpy as np
import matplotlib.pyplot as plt

data_om=np.load('energy_timeseries_om.npy')
#Simulation step numbers and data recorded every 500 steps
stepnumber=data_om[0,:]
energy_timeseries=data_om[1,:]
[t_equil,g,N_eff]=timeseries.detectEquilibration(energy_timeseries)
print("Equilibrated (according to function) at index %i" % t_equil)
plt.plot(stepnumber,energy_timeseries)
plt.title("Timeseries value vs. Step Number")

energy_timeseries_om.zip

jchodera · 2018-02-08T15:36:36Z

Thanks! This is super helpful @ocmadin!

mrshirts · 2018-02-15T04:40:49Z

Seems like this is also a duplicate of #277

ocmadin · 2018-07-13T16:39:53Z

@jchodera just wanted to bring this up again, as I have been getting outliers in heat capacity calculations from fluctuations, and this turned out to be the culprit. I have a temporary caveman solution, but would be interested in a more permanent (and elegant) solution.

mrshirts · 2018-07-13T17:19:11Z

Can you post a sample data set that is causing problems?

ocmadin · 2018-07-13T18:10:28Z

Okay, so here is a comparison of two datasets, one where timeseries.detectEquilibration works and one where it fails:

These are both 9 ns simulations of 200 cyclohexane molecules at 293.15 K and 1.01 bar.

In the first (cychex_timeseries_good_equil.npy), detectEquilibration works properly,

While in the second (cychex_timeseries_bad_equil.npy) , it does not:

Interestingly, these two timeseries start at roughly equal volumes (0.02302 for good vs 0.02304 for bad), but within the first 10 timesteps, the "bad" timeseries hits a max value of 0.02341, where the "good" timeseries only hits a max value of 0.02322:

It seems like this little spike is the difference between detectEquilibration working properly and failing.

cychex_timeseries_data.zip

fjclark · 2024-10-23T12:32:58Z

I've had a look at these with some of the equilibration heuristics we tested here. Specifically:

original_pymbar: equivalent to the standard PyMBAR detect_equilibration with fast=False.
pymbar_min_sse: uses the same method as original_pymbar to account for autocorrelation, but selects the equilibration time as the minimum of the marginal standard error (squared standard error (SSE)), rather than the maximum effective sample size.
window_root_n: also determines equilibration according to the minimum marginal standard error, but accounts for autocorrelation using a window of size $\sqrt N$, where $N$ is the number of samples remaining for the truncation time being evaluated. This method seemed to be the most robust of those we tested.
mser: the original marginal standard error rule of White. This is equivalent to window_root_n with a window size of 1/ calculating the standard error completely ignoring autocorrelation.

Naden Issue

As discussed in this issue, selecting the truncation point by minimum marginal standard error, rather than maximum effective sample size increases sensitivity to initial bias and means that all methods other than original_pymbar remove the highly biased samples (truncation indices are shown in the legend):

pymbar_min_sse throws away a few more than might seem reasonable, but this is still a small fraction of the data:

Madin First Issue

Here, both the original_pymbar and pymbar_min_sse methods fail to discard the highly biased samples, while the other methods produce reasonable results:

Here, the window_root_n method throws away a few more samples than looks reasonable, but again this is a small fraction of the overall data:

To understand why using the minimum marginal standard error (or SSE - squared standard error) in the pymbar_min_sse method didn't fix the issue, I've plotted the the ESS or 1/SSE for the first few indices for all methods (normalising so that the first value is 1):

The ESS (with original PyMBAR autocorrelation treatment) shows a large spike at index 0. ESS is given by ESS = n_actual * Var_uncor / Var_cor, where "n_actual" is the actual number of samples, and "uncor" and "cor" denote the uncorrelated and correlated variance estimates. Plotting these two variance estimates (normalised so that the first term is 1) shows that the first few very biased samples dramatically increase Var_uncor, but decrease Var_cor (by shifting the mean closer to a more central value for the later data):

Therefore, using 1/SSE = n_actual / Var_cor dramatically reduces the initial spike (pymbar_min_sse). However, it doesn't totally fix the initial spike, because the first correlated variance estimate is still low. As we go to methods which less thoroughly account for autocorrelation ( pymbar_min_sse -> window_root_n -> mser), we go from using 1/SSE = n_actual / Var_cor to 1/SSE = n_actual / Var_uncor, which gradually removes the initial spike in 1/SSE, moving the equilibration time later.

Madin Second Issue (Bad Cyclohexane Series)

Here, I find that original_pymbar selects a truncation index of 2, rather than 0 as in the issue. I've checked with detectEquilibration(fast=False) from PyMBAR version 3.1.1 and also get 2. Here, the first ~ 10 points are particularly biased, and all of the methods other than original_pymbar discard these:

Summary

Selecting the truncation point by minimum marginal standard error, rather than maximum effective sample size increases sensitivity to the initial transient.
The window_root_n and mser methods seem to consistently remove the problematic samples. However, we'd generally recommend against MSER since it's prone to under-discarding when a trend in the average is hidden by noise (see here).

Code

All code and output: pymbar_equil_issues.tar.gz (you'll need to install red with mamba install -c conda-forge red-molsim).

Thanks!

fjclark mentioned this issue Oct 23, 2024

Including methods for equilibration detection #543

Open

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Timeseries can be tricked by a couple bad first inputs #283

Timeseries can be tricked by a couple bad first inputs #283

Lnaden commented Jan 22, 2018

jchodera commented Jan 22, 2018

jchodera commented Jan 22, 2018

jchodera commented Jan 22, 2018

jchodera commented Jan 22, 2018

jchodera commented Jan 22, 2018

jchodera commented Jan 22, 2018

ocmadin commented Feb 8, 2018

jchodera commented Feb 8, 2018

ocmadin commented Feb 8, 2018

jchodera commented Feb 8, 2018

mrshirts commented Feb 15, 2018

ocmadin commented Jul 13, 2018

mrshirts commented Jul 13, 2018

ocmadin commented Jul 13, 2018

fjclark commented Oct 23, 2024

Timeseries can be tricked by a couple bad first inputs #283

Timeseries can be tricked by a couple bad first inputs #283

Comments

Lnaden commented Jan 22, 2018

jchodera commented Jan 22, 2018

jchodera commented Jan 22, 2018

jchodera commented Jan 22, 2018

jchodera commented Jan 22, 2018

jchodera commented Jan 22, 2018

jchodera commented Jan 22, 2018

ocmadin commented Feb 8, 2018

jchodera commented Feb 8, 2018

ocmadin commented Feb 8, 2018

jchodera commented Feb 8, 2018

mrshirts commented Feb 15, 2018

ocmadin commented Jul 13, 2018

mrshirts commented Jul 13, 2018

ocmadin commented Jul 13, 2018

fjclark commented Oct 23, 2024

Naden Issue

Madin First Issue

Madin Second Issue (Bad Cyclohexane Series)

Summary

Code