-
Notifications
You must be signed in to change notification settings - Fork 93
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
nan uncertainty estimates #428
Comments
Quick question: Can you confirm if this error can also be triggered using the gaussian_work test system?
This would be helpful in investigating the bias part of this issue, since I'm currently unclear on how G0 and G1 are derived in the case where forward and reverse work mean and stddev all vary independently. (If the error cannot be reproduced using the gaussian_work test system, it may be useful to expand this issue with a reproducing example in a different format. (E.g. by providing a pair of reduced potential energy functions u_A and u_B, and accompanying samples from p_A and p_B.)) |
Oops yeah the estimates for G0 and G1 are clearly wrong here. Let me try to pare down from the actual distributions and not the work values. |
I've just suggested we also add a method to compute the number of effective samples: #427 I'm guessing these cases are seeing a collapse in the number of effective samples, meaning the estimates will be unreliable. We should add some protection to make sure we don't return |
@jchodera - see the discussion above - the example was using U_1 and U_2, not U_1-U2 and U_2-U1, so the results were nonsensical because the data was not physical. Good question about what should be returned in those cases, though, and if they should be trapped (if possible? Might be hard to distinguish in many cases from numerically bad data). |
Oh! Thanks for pointing that out, @mrshirts. Could we do a quick test of consistency with the CFT? If there is sufficient data (like there was here), it can quickly flag that there's an input issue. |
That's a great question. Should be able to take the log of the distribution ratio and get a straight line, and then can test how far off it is. But the tolerance is a real question - HOW far off can it be? That will take some experimentation which will take a little time. |
You're referring to tests like this paper, which plots This could work, but only with a ton of samples, since histogram or kernel estimators for Lower-order moments of
which can be expressed in terms of arbitrary functions of the work
to give us a relationship between moments
(We may want to use central moments, These should be much better conditioned and easier to check for statistical violation. We could truncate this sum at a number of moments that are sensible based on the quantity of data and signal a warning if things are clearly in violation. That should catch issues like this one.
|
You can do maximum likelihood estimates of the line parameters, you don't need kernel density or histogram - see the equivalent derivation for physical_validation: https://pubs.acs.org/doi/abs/10.1021/ct300688p. |
The probabilities are unknown, but the ratio is known (rather, known slope but unknown constant), so it works out. In ensemble validation
Crooks is:
So same math should work. I can get this done for 4.0 release (sometime this semester? A couple of big deadlines first). |
For certain types of overlapping distributions, the uncertainty estimate returned by BAR can be
nan
Returns
The distributions look like:

Edit: G0 and G1 below is wrong, confused the distributions, there is probably no issue with the bias!
Futhermore, since the analytical partition functions are
Z(sig)=1/(sig*sqrt(2pi))
, so usingG=-ln(Z)
the analytical free energies areG0=3.221
andG1=6.083
, The estimator also appears to be biased.The text was updated successfully, but these errors were encountered: