Differences between MBAR and BAR. #393

AnguseZhang · 2020-04-28T19:31:38Z

When performing FEP, I found there are significant differences between MBAR and BAR, as the figure shows.

The system is benzol and oxylene in water (solvated).

When only putting data of two adjacent states into MBAR, MBAR results are the same with BAR.
But putting data of all states, the gap appears.

This is my code

from pymbar import MBAR as MBAR_
import numpy as np


N = [1000 for i in range(16)]
u_kn  = np.loadtxt("u.txt")	

mbar_ori = MBAR_(u_kn, N)
out = mbar_ori.getFreeEnergyDifferences(return_dict=True)
print(out["Delta_f"][0, -1])

This is original file. The result of MBAR is -3.88 kbT, i.e, -9.61852 kj /mol. But results of BAR and TI are nearly -17 kJ/mol.

u.txt

In my view, MBAR and BAR are both unbiased estimators, and thus the estimation (free energy difference ) should be the same. Could you please give me some instructions ? Thanks !

The text was updated successfully, but these errors were encountered:

jchodera · 2020-04-29T03:52:52Z

Thanks for the detailed report!

Could you provide the code and data needed to reproduce the BAR estimates too?

One thing we find often leads to this kind of error is incorrect units: BAR and MBAR will necessarily give almost identical output for two states no matter whether the energy units are correct (units of kT) or not, but once there are three states are more, the results will diverge if the energetics aren't self-consistent in the units you provide (which should be units of kT). One quick way to check this is to simply use exponential averaging (exp[-\Delta u(x)]) to reweight the p(u) from one sampled state to another sampled state and make sure they overlap. If the units are wrong, then it will be obvious the reweighted distribution does not overlap with the other state.

AnguseZhang · 2020-04-29T04:02:48Z

Thanks very much for your reply!

This is the original dhdl.xvg files obatined from gromacs.

data.zip

Actually I first used alchemistry-analysis to obtain the original TI , BAR , MBAR results.
My code is
python alchemical_analysis.py -d . -q xvg -p dhdl -u kJ

After I found they are inconsistent, I tried to use pymbar directly , and gromacs gmx bar to obtain BAR results. This is gmx bar results.

Thanks for the detailed report!

Could you provide the code and data needed to reproduce the BAR estimates too?

One thing we find often leads to this kind of error is incorrect units: BAR and MBAR will necessarily give almost identical output for two states no matter whether the energy units are correct (units of kT) or not, but once there are three states are more, the results will diverge if the energetics aren't self-consistent in the units you provide (which should be units of kT). One quick way to check this is to simply use exponential averaging (exp[-\Delta u(x)]) to reweight the p(u) from one sampled state to another sampled state and make sure they overlap. If the units are wrong, then it will be obvious the reweighted distribution does not overlap with the other state.

mrshirts · 2020-05-02T21:59:26Z

So, it's rather hard to debug something occurring in alchemical-analysis, since it's in python2, which is not supported anymore.

Could you try the same calculating with alchemlyb (https://alchemlyb.readthedocs.io/en/latest/)? And see what you get there?

AnguseZhang · 2020-05-04T16:22:28Z

So, it's rather hard to debug something occurring in alchemical-analysis, since it's in python2, which is not supported anymore.

Could you try the same calculating with alchemlyb (https://alchemlyb.readthedocs.io/en/latest/)? And see what you get there?

Thanks for your reply!
Actually I've tried alchemlyb before, and my results are the same.

`
#!/usr/bin/env python

import pandas as pd

from alchemlyb.parsing.gmx import extract_dHdl, extract_u_nk
from alchemlyb.preprocessing import statistical_inefficiency
from alchemlyb.estimators import BAR, MBAR, TI

if name == "main":

import glob
import os
import argparse

parser = argparse.ArgumentParser(description="Perform a simple alchemical analysis")

parser.add_argument('datadir', type=str,
                    help="directory of XVG files")
parser.add_argument('-T', type=float,
                    help="Temperature (K) simulations performed at")

args = parser.parse_args()

# get a list of files from the given directory
files = glob.glob(os.path.join(args.datadir, '*.xvg*'))

# extract and subsample dHdl using statistical inefficiency
print("Gathering dHdl from {} files".format(len(files)))
#dHdl = [extract_dHdl(i, T=args.T) for i in files]
dHdl = pd.concat([extract_dHdl(i, T=args.T) for i in files])
print(files)
print("Subsampling dHdl on statistical inefficiency")
#dHdl = pd.concat([statistical_inefficiency(i, i.iloc[:, 0]) for i in dHdl])

# extract and subsample u_nk using statistical inefficiency
print("Gathering u_nk from {} files".format(len(files)))
#u_nk = [extract_u_nk(i, T=args.T) for i in files]
u_nk = pd.concat([extract_u_nk(i, T=args.T) for i in files])
print("Subsampling u_nk on statistical inefficiency")
#u_nk = pd.concat([statistical_inefficiency(i, i.iloc[:, 0]) for i in u_nk])

# fit dHdl estimators 
print("Fitting TI on dHdl")
ti = TI().fit(dHdl)

# fit u_nk estimators
print("Fitting BAR on u_nk")
bar = BAR().fit(u_nk)
print("Fitting MBAR on u_nk")
mbar = MBAR().fit(u_nk)

print("====== Results ======")

# print free energy estimates between lowest lambda and highest lambda
# these will all be in units of kT!
# NOTE: for BAR we would need to perform bootstrap sampling to get
# uncertainties of the endpoints
print("TI: {} +/- {} kT".format(ti.delta_f_.iloc[0, -1], ti.d_delta_f_.iloc[0, -1]))
print("BAR: {} +/- {} kT".format(bar.delta_f_.iloc[0, -1], "unknown"))
print("MBAR: {} +/- {} kT".format(mbar.delta_f_.iloc[0, -1], mbar.d_delta_f_.iloc[0, -1]))

`

mrshirts · 2020-05-04T16:28:19Z

Thanks! That will make it a lot easier to debug. Could you perhaps upload that as a file (or a .zip, if can't be uploaded to make sure I get the indentation right? It didn't copy over correctly. Also, if you post the command you used.

mrshirts · 2020-05-04T16:37:31Z

OK, I think I have it working now!

AnguseZhang · 2020-05-04T16:38:26Z

Thanks a lot ! These are the script and data.
My command is python3 alchemical-cal.py -T 298 data
alchemical-cal.py.zip
data.zip

mrshirts · 2020-05-04T17:17:36Z

One question about the underlying data - I notice that the mass-dhdl is nonzero. Are the masses changing from state A to state B? If so, is the potential or total energy being printed out to the dhdl?

AnguseZhang · 2020-05-04T18:07:09Z

One question about the underlying data - I notice that the mass-dhdl is nonzero. Are the masses changing from state A to state B? If so, is the potential or total energy being printed out to the dhdl?

The masses don't change from state A to state. In my mdp files, the settings are
fep-lambdas = 0.0 0.067 0.133 0.2 0.267 0.333 0.4 0.467 0.533 0.6 0.667 0.733 0.8 0.867 0.933 1.0 vdw-lambdas = 0.0 0.067 0.133 0.2 0.267 0.333 0.4 0.467 0.533 0.6 0.667 0.733 0.8 0.867 0.933 1.0 mass-lambdas = 0.0 0.0 0.0 0.0 0.0 0.0 0.0 0.0 0.0 0.0 0.0 0.0 0.0 0.0 0.0 0.0

mrshirts · 2020-05-04T18:38:10Z

In the topology, are the A and B states different masses? And it is the total or potential energy that is printed out? And if so, do any of the atoms that have masses changing have constraints?

The fact that the mass-lamdas are zero eliminates some of the concerns, but I'm just trying to eliminate the possibility there is something weird going on with the underlying GROMACS data. lambda-dependent masses haven't been tested nearly as much in GROMACS.

AnguseZhang · 2020-05-04T19:39:41Z

In the topology, are the A and B states different masses? And it is the total or potential energy that is printed out? And if so, do any of the atoms that have masses changing have constraints?

The fact that the mass-lamdas are zero eliminates some of the concerns, but I'm just trying to eliminate the possibility there is something weird going on with the underlying GROMACS data. lambda-dependent masses haven't been tested nearly as much in GROMACS.

In this case, A is benzol and B is oxylene. So there are dummy atoms in A which transform into H atoms in B. There are slight differences between their masses.

I didn't set constraints in this case.
This is the topology file.

processed.top.zip

In dhdl.xvg file, I didn't find potential energy and total energy. But I can find them in log file. Are there some incorrect settings ?

mrshirts · 2020-05-05T00:33:40Z

Ah, OK, the masses are definitely changing. You can check the mdout.mdp - depending on the version, there should be a dhdl-print-energy energy setting that is either total or potential. What does it say?

If it would be POSSIBLE to run this calculation again in 1) NVT and 2) with constant masses, that would be helpful for debugging. You can just leave the A masses constant; the free energies should be independent of the masses.

It may be something else, but these changes will eliminate some possibilities.

mrshirts · 2022-06-17T18:02:10Z

Did this ever get resolved?

mrshirts added the bug label Jun 17, 2022

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Differences between MBAR and BAR. #393

Differences between MBAR and BAR. #393

AnguseZhang commented Apr 28, 2020 •

edited by jchodera

Loading

jchodera commented Apr 29, 2020

AnguseZhang commented Apr 29, 2020

mrshirts commented May 2, 2020

AnguseZhang commented May 4, 2020 •

edited

Loading

mrshirts commented May 4, 2020

mrshirts commented May 4, 2020

AnguseZhang commented May 4, 2020 •

edited

Loading

mrshirts commented May 4, 2020

AnguseZhang commented May 4, 2020

mrshirts commented May 4, 2020

AnguseZhang commented May 4, 2020

mrshirts commented May 5, 2020 •

edited

Loading

mrshirts commented Jun 17, 2022

Differences between MBAR and BAR. #393

Differences between MBAR and BAR. #393

Comments

AnguseZhang commented Apr 28, 2020 • edited by jchodera Loading

jchodera commented Apr 29, 2020

AnguseZhang commented Apr 29, 2020

mrshirts commented May 2, 2020

AnguseZhang commented May 4, 2020 • edited Loading

mrshirts commented May 4, 2020

mrshirts commented May 4, 2020

AnguseZhang commented May 4, 2020 • edited Loading

mrshirts commented May 4, 2020

AnguseZhang commented May 4, 2020

mrshirts commented May 4, 2020

AnguseZhang commented May 4, 2020

mrshirts commented May 5, 2020 • edited Loading

mrshirts commented Jun 17, 2022

AnguseZhang commented Apr 28, 2020 •

edited by jchodera

Loading

AnguseZhang commented May 4, 2020 •

edited

Loading

AnguseZhang commented May 4, 2020 •

edited

Loading

mrshirts commented May 5, 2020 •

edited

Loading