Free energy calculation with double system/single box simulation

Hello,

I want to calculate the relative binding free energy difference of a complex, with alchemical transformation using the double system/single box approach. I followed the tutorial on the pmx page (Mutation free energy calculations) but this covers a mutation in a single protein complex.

I am interested to see the free energy difference of a complex where the ligand has an arginine to alanine (R2A) mutation. Based on the previous tutorial and the Patel et al. (2021) paper (https://pubs.acs.org/doi/10.1021/acs.jctc.0c01045) I did the following:

  • Hybrid topology generation with the pmx server. Then, I build the boxes manually containing the receptor/wild-type ligand complex and the unbound mutated ligand, and vice-versa.
  • 100 ns equilibrium run. Then I discarded the first 20 ns and selected 100 snapshots from the rest of the run for the transition simulations.
  • For each snapshot, 500 ps forward and backward runs were performed changing the lambda from 0 → 1 and from 1 → 0 respectively.
  • Finally, I calculated the DeltaG with pmx analyse using the dhdl.xvg files for each snapshot run.

The analysis output is the following:

========================================================
                       ANALYSIS
 ========================================================
  Number of forward (0->1) trajectories: 100
  Number of reverse (1->0) trajectories: 100
  Temperature : 300.00 K

 --------------------------------------------------------
             Crooks Gaussian Intersection     
 --------------------------------------------------------
  CGI: Forward Gauss mean =  1396.12 kJ/mol std =    11.06 kJ/mol
  CGI: Reverse Gauss mean =   913.69 kJ/mol std =    30.94 kJ/mol
  CGI: dG =  1268.34 kJ/mol
  CGI: Std Err (bootstrap:parametric) =     8.11 kJ/mol
  CGI: Std Err (bootstrap) =    12.14 kJ/mol
    Forward: gaussian quality = 0.42
             ---> KS-Test Ok
    Reverse: gaussian quality = 0.81
             ---> KS-Test Ok

 --------------------------------------------------------
             Bennett Acceptance Ratio     
 --------------------------------------------------------
  BAR: dG =  1176.18 kJ/mol
  BAR: Std Err (analytical) = 17594679082837450.00 kJ/mol
  BAR: Std Err (bootstrap)  =     6.18 kJ/mol
  BAR: Conv =     1.00
  BAR: Conv Std Err (bootstrap) =     0.00

 --------------------------------------------------------
             Jarzynski estimator     
 --------------------------------------------------------
  JARZ: dG Forward =  1371.44 kJ/mol
  JARZ: dG Reverse =   980.92 kJ/mol
  JARZ: dG Mean    =  1176.18 kJ/mol
  JARZ: Std Err Forward (bootstrap) =     3.28 kJ/mol
  JARZ: Std Err Reverse (bootstrap) =    11.18 kJ/mol
  JARZ_Gauss: dG Forward =  1371.34 kJ/mol
  JARZ_Gauss: dG Reverse =  1107.57 kJ/mol
  JARZ_Gauss: dG Mean    =  1239.45 kJ/mol
  JARZ_Gauss: Std Err (analytical) Forward =     3.69 kJ/mol
  JARZ_Gauss: Std Err (analytical) Reverse =    27.73 kJ/mol
  JARZ_Gauss: Std Err Forward (bootstrap) =     4.61 kJ/mol
  JARZ_Gauss: Std Err Reverse (bootstrap) =    31.12 kJ/mol
 ========================================================

The value I got is 1176.18 ± 6.12 kJ/mol (281.11 ± 1.46 kcal/mol) which is much greater than expected. I am repeating the transition runs using 10 ns.

Attached are the plots and histograms of the W per snapshot (sorry, the legend covers the forward data).

The W histograms are very separated and do not intersect. Is this a sign of poor convergence or rather an issue with the system setup?

Should I take something else into account? Any suggestion will be very welcomed.

Best,
Yasser

The lack of overlap is indeed a sign of poor convergence. But there is more going on: With better convergence the value might shift to 1000 or 1300 but that would still be an unexpectedly large value. The large absolute value suggests that you might have a charge change after all, despite using a DSSB approach. Did you make sure that both systems move in opposite directions?

Thank you very much for your reply.

How can I check that? Attached is the dH/dl vs time plot for the last snapshot.

I would appreciate any tip or advice to analyze the convergence. At the moment I am repeating the runs, with a longer sampling and then, the transitions will be done during 10 ns.

Bests

what I meant was a sanity check of the topology. when you’re changing R2A in the one system (with lambda going from 0 to 1), at the same time you want to do A2R in the other. Is that’s how it’s defined in the topology?

Yes. The topologies are defined like that: R2A (0 → 1, bound ligand with R, unbound ligand with A) and A2R (1 → 0, bound ligand with A, unbound ligand with R).

Also, I am using GROMACS 2020.1. Is it fine or is there any suggested version for the calculations?

ok, the setup and gromacs version should be OK, but the large dG you’re obtaining is still worrying. It could be that a sampling issue would be responsible for such a large discrepancy: did you visually inspect the end structures of the transisions? For example, for the A2R cases, does the R sidechain grow into a realistic position?

If not, this could be a sign that this mutation is really destabilizing. Not by hundreds of kcal/mol, but by enough such that no reasonable solution is found for the mutated side chain. The simulation in such a case could wander off in an escape direction, not actually converging and thus reporting a bogus value.

When performing the transition do you actually go in the direction 0->1 and later in the direction 1->0 via the .mdp parameters? In other words, is your topology setup such that for the forward transition you required mdp settings: init-lambda=0, delta-lambda>0. And for reverse transition: init-lambda=1, delta-lambda<0?

If not, you need to reverse the sign of the backward transition: pmx analyse --reverseB

The free energy section in my mdp files are the following:

; Forward transition
free_energy     = yes
init_lambda     = 0
delta_lambda    = 4e-7
sc-alpha        = 0.3
sc-sigma        = 0.25
nstcalcenergy   = 1
nstdhdl         = 1

; Backward transition
free_energy     = yes
init_lambda     = 1
delta_lambda    = -4e-7
sc-alpha        = 0.3
sc-sigma        = 0.25
nstcalcenergy   = 1
nstdhdl         = 1

Is that OK?

Hi again,

A dumb question for just clarification. When building the systems, both bound and unbound ligands should have the R2A mutation, right?

Best

well, strictly one should have R2A and the other A2R

Right, thanks.

One more idea: in the dhdl plot you have a simulation of 500 ps. Assuming the timestep of 2 fs, it would make 2.5e5 nsteps. For that you would need to set delta_lambda to 4e-6, but in your .mdp it is set to 4e-7. Could this be the issue?

That could also be.
I am now re-running everything and my plan is to do both fast (500 ps) and slow (10 ns) transitions. In the last case that would mean 5e6 steps (timestep 2 fs), so delta_lambda would be 2e-7.