Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Differences between MBAR and BAR. #393

Open
AnguseZhang opened this issue Apr 28, 2020 · 13 comments
Open

Differences between MBAR and BAR. #393

AnguseZhang opened this issue Apr 28, 2020 · 13 comments
Labels

Comments

@AnguseZhang
Copy link

AnguseZhang commented Apr 28, 2020

When performing FEP, I found there are significant differences between MBAR and BAR, as the figure shows.

The system is benzol and oxylene in water (solvated).

image

When only putting data of two adjacent states into MBAR, MBAR results are the same with BAR.
But putting data of all states, the gap appears.

image

This is my code

from pymbar import MBAR as MBAR_
import numpy as np


N = [1000 for i in range(16)]
u_kn  = np.loadtxt("u.txt")	

mbar_ori = MBAR_(u_kn, N)
out = mbar_ori.getFreeEnergyDifferences(return_dict=True)
print(out["Delta_f"][0, -1])

This is original file. The result of MBAR is -3.88 kbT, i.e, -9.61852 kj /mol. But results of BAR and TI are nearly -17 kJ/mol.

u.txt

In my view, MBAR and BAR are both unbiased estimators, and thus the estimation (free energy difference ) should be the same. Could you please give me some instructions ? Thanks !

@jchodera
Copy link
Member

Thanks for the detailed report!

Could you provide the code and data needed to reproduce the BAR estimates too?

One thing we find often leads to this kind of error is incorrect units: BAR and MBAR will necessarily give almost identical output for two states no matter whether the energy units are correct (units of kT) or not, but once there are three states are more, the results will diverge if the energetics aren't self-consistent in the units you provide (which should be units of kT). One quick way to check this is to simply use exponential averaging (exp[-\Delta u(x)]) to reweight the p(u) from one sampled state to another sampled state and make sure they overlap. If the units are wrong, then it will be obvious the reweighted distribution does not overlap with the other state.

@AnguseZhang
Copy link
Author

Thanks very much for your reply!

This is the original dhdl.xvg files obatined from gromacs.

data.zip

Actually I first used alchemistry-analysis to obtain the original TI , BAR , MBAR results.
My code is
python alchemical_analysis.py -d . -q xvg -p dhdl -u kJ

image

After I found they are inconsistent, I tried to use pymbar directly , and gromacs gmx bar to obtain BAR results. This is gmx bar results.

image

Thanks for the detailed report!

Could you provide the code and data needed to reproduce the BAR estimates too?

One thing we find often leads to this kind of error is incorrect units: BAR and MBAR will necessarily give almost identical output for two states no matter whether the energy units are correct (units of kT) or not, but once there are three states are more, the results will diverge if the energetics aren't self-consistent in the units you provide (which should be units of kT). One quick way to check this is to simply use exponential averaging (exp[-\Delta u(x)]) to reweight the p(u) from one sampled state to another sampled state and make sure they overlap. If the units are wrong, then it will be obvious the reweighted distribution does not overlap with the other state.

@mrshirts
Copy link
Collaborator

mrshirts commented May 2, 2020

So, it's rather hard to debug something occurring in alchemical-analysis, since it's in python2, which is not supported anymore.

Could you try the same calculating with alchemlyb (https://alchemlyb.readthedocs.io/en/latest/)? And see what you get there?

@AnguseZhang
Copy link
Author

AnguseZhang commented May 4, 2020

So, it's rather hard to debug something occurring in alchemical-analysis, since it's in python2, which is not supported anymore.

Could you try the same calculating with alchemlyb (https://alchemlyb.readthedocs.io/en/latest/)? And see what you get there?

Thanks for your reply!
Actually I've tried alchemlyb before, and my results are the same.

image

`
#!/usr/bin/env python

import pandas as pd

from alchemlyb.parsing.gmx import extract_dHdl, extract_u_nk
from alchemlyb.preprocessing import statistical_inefficiency
from alchemlyb.estimators import BAR, MBAR, TI

if name == "main":

import glob
import os
import argparse

parser = argparse.ArgumentParser(description="Perform a simple alchemical analysis")

parser.add_argument('datadir', type=str,
                    help="directory of XVG files")
parser.add_argument('-T', type=float,
                    help="Temperature (K) simulations performed at")

args = parser.parse_args()

# get a list of files from the given directory
files = glob.glob(os.path.join(args.datadir, '*.xvg*'))

# extract and subsample dHdl using statistical inefficiency
print("Gathering dHdl from {} files".format(len(files)))
#dHdl = [extract_dHdl(i, T=args.T) for i in files]
dHdl = pd.concat([extract_dHdl(i, T=args.T) for i in files])
print(files)
print("Subsampling dHdl on statistical inefficiency")
#dHdl = pd.concat([statistical_inefficiency(i, i.iloc[:, 0]) for i in dHdl])

# extract and subsample u_nk using statistical inefficiency
print("Gathering u_nk from {} files".format(len(files)))
#u_nk = [extract_u_nk(i, T=args.T) for i in files]
u_nk = pd.concat([extract_u_nk(i, T=args.T) for i in files])
print("Subsampling u_nk on statistical inefficiency")
#u_nk = pd.concat([statistical_inefficiency(i, i.iloc[:, 0]) for i in u_nk])

# fit dHdl estimators 
print("Fitting TI on dHdl")
ti = TI().fit(dHdl)

# fit u_nk estimators
print("Fitting BAR on u_nk")
bar = BAR().fit(u_nk)
print("Fitting MBAR on u_nk")
mbar = MBAR().fit(u_nk)

print("====== Results ======")

# print free energy estimates between lowest lambda and highest lambda
# these will all be in units of kT!
# NOTE: for BAR we would need to perform bootstrap sampling to get
# uncertainties of the endpoints
print("TI: {} +/- {} kT".format(ti.delta_f_.iloc[0, -1], ti.d_delta_f_.iloc[0, -1]))
print("BAR: {} +/- {} kT".format(bar.delta_f_.iloc[0, -1], "unknown"))
print("MBAR: {} +/- {} kT".format(mbar.delta_f_.iloc[0, -1], mbar.d_delta_f_.iloc[0, -1]))

`

@mrshirts
Copy link
Collaborator

mrshirts commented May 4, 2020

Thanks! That will make it a lot easier to debug. Could you perhaps upload that as a file (or a .zip, if can't be uploaded to make sure I get the indentation right? It didn't copy over correctly. Also, if you post the command you used.

@mrshirts
Copy link
Collaborator

mrshirts commented May 4, 2020

OK, I think I have it working now!

@AnguseZhang
Copy link
Author

AnguseZhang commented May 4, 2020

Thanks a lot ! These are the script and data.
My command is python3 alchemical-cal.py -T 298 data
alchemical-cal.py.zip
data.zip

@mrshirts
Copy link
Collaborator

mrshirts commented May 4, 2020

One question about the underlying data - I notice that the mass-dhdl is nonzero. Are the masses changing from state A to state B? If so, is the potential or total energy being printed out to the dhdl?

@AnguseZhang
Copy link
Author

One question about the underlying data - I notice that the mass-dhdl is nonzero. Are the masses changing from state A to state B? If so, is the potential or total energy being printed out to the dhdl?

The masses don't change from state A to state. In my mdp files, the settings are
fep-lambdas = 0.0 0.067 0.133 0.2 0.267 0.333 0.4 0.467 0.533 0.6 0.667 0.733 0.8 0.867 0.933 1.0 vdw-lambdas = 0.0 0.067 0.133 0.2 0.267 0.333 0.4 0.467 0.533 0.6 0.667 0.733 0.8 0.867 0.933 1.0 mass-lambdas = 0.0 0.0 0.0 0.0 0.0 0.0 0.0 0.0 0.0 0.0 0.0 0.0 0.0 0.0 0.0 0.0

@mrshirts
Copy link
Collaborator

mrshirts commented May 4, 2020

In the topology, are the A and B states different masses? And it is the total or potential energy that is printed out? And if so, do any of the atoms that have masses changing have constraints?

The fact that the mass-lamdas are zero eliminates some of the concerns, but I'm just trying to eliminate the possibility there is something weird going on with the underlying GROMACS data. lambda-dependent masses haven't been tested nearly as much in GROMACS.

@AnguseZhang
Copy link
Author

In the topology, are the A and B states different masses? And it is the total or potential energy that is printed out? And if so, do any of the atoms that have masses changing have constraints?

The fact that the mass-lamdas are zero eliminates some of the concerns, but I'm just trying to eliminate the possibility there is something weird going on with the underlying GROMACS data. lambda-dependent masses haven't been tested nearly as much in GROMACS.

In this case, A is benzol and B is oxylene. So there are dummy atoms in A which transform into H atoms in B. There are slight differences between their masses.

屏幕快照 2020-05-05 上午3 31 24

I didn't set constraints in this case.
This is the topology file.

processed.top.zip

In dhdl.xvg file, I didn't find potential energy and total energy. But I can find them in log file. Are there some incorrect settings ?

image

@mrshirts
Copy link
Collaborator

mrshirts commented May 5, 2020

Ah, OK, the masses are definitely changing. You can check the mdout.mdp - depending on the version, there should be a dhdl-print-energy energy setting that is either total or potential. What does it say?

If it would be POSSIBLE to run this calculation again in 1) NVT and 2) with constant masses, that would be helpful for debugging. You can just leave the A masses constant; the free energies should be independent of the masses.

It may be something else, but these changes will eliminate some possibilities.

@mrshirts
Copy link
Collaborator

Did this ever get resolved?

@mrshirts mrshirts added the bug label Jun 17, 2022
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
Projects
None yet
Development

No branches or pull requests

3 participants