After running the ddG calculation tutorial (ligand_tutorial.ipynb), I got different result

Hello,

I practiced calculating ddG using the ligand_tutorial.ipynb from the pmx (developer) branch. In the tutorial, the expected result is -3.2 kJ/mol, and the example result on GitHub shows a similar value of -3.0 kJ/mol. However, when I ran the tutorial (using GROMACS version 2023.3), I got a significantly different result of -6.4 kJ/mol.

Additionally, the energy error was larger compared to the example. Could you help me figure out what might be causing this issue?

Thank you for taking the time to read my question.

Hi,

Did you use the same mdp files provided in the tutorial? I also assume the reported estimate (-6.4 kJ/mol) is the average of three repeats.

Although we didn’t see a gromacs version dependence of the free energy estimates earlier on tripeptide (please have a look at this discussion Challenges in Reproducing ddG Estimation Results for 1STN - #11 by mstieffe), you can also try using the same gromacs version as the tutorial (just to confirm).

Hello, thank you for the quick response.

  1. As you expected, -6.4 kJ/mol is the average of three independent runs. I recalculated from the em and eq steps and used the mdp files provided in the tutorial.
  2. Since the tutorial only provides the tpr file for ligand(water)_run1, I only calculated the transition and obtained a value of -9.1 kJ/mol (error = 0.35), which is similar to the tutorial result.
  3. Below are the results I obtained from my calculations and the precalculated analysis results.

(precalculated)
val err_analyt err_boot framesA framesB
edge_18625-1_18626-1_water_1 -8.850000 0.330000 0.300000 80.0 80.0
edge_18625-1_18626-1_protein_1 -13.460000 0.940000 0.710000 80.0 80.0
edge_18625-1_18626-1_water_2 -8.860000 0.320000 0.280000 80.0 80.0
edge_18625-1_18626-1_protein_2 -10.250000 0.550000 0.460000 80.0 80.0
edge_18625-1_18626-1_water_3 -9.150000 0.340000 0.390000 80.0 80.0
edge_18625-1_18626-1_protein_3 -12.160000 0.600000 0.590000 80.0 80.0
edge_18625-1_18626-1_water -8.953333 0.207047 0.207999 NaN NaN
edge_18625-1_18626-1_protein -11.956667 0.867509 0.833806 NaN NaN

(my results)
val err_analyt err_boot framesA framesB
edge_18625-1_18626-1_water_1 -9.640000 0.360000 0.300000 80.0 80.0
edge_18625-1_18626-1_protein_1 -15.360000 0.680000 0.570000 80.0 80.0
edge_18625-1_18626-1_water_2 -8.370000 0.350000 0.330000 80.0 80.0
edge_18625-1_18626-1_protein_2 -17.680000 0.500000 0.430000 80.0 80.0
edge_18625-1_18626-1_water_3 -9.090000 0.400000 0.390000 80.0 51.0
edge_18625-1_18626-1_protein_3 -13.300000 0.570000 0.560000 80.0 80.0
edge_18625-1_18626-1_water -9.033333 0.364883 0.360815 NaN NaN
edge_18625-1_18626-1_protein -15.446667 1.078985 1.078918 NaN NaN

It seems that the largest differences are in the protein-ligand complex calculations.

I used the same input files, only reran mdrun.
Could you help me understand why there’s such a significant difference in the results?

Thank you for your assistance.

Hi,

The issue seems to arise from the run2 of the protein-ligand complex, with dG of -17.68 kJ/mol. Is there any sudden change in the evolution of work values in the “wplot.png” for this run (it could happen due to conformational changes)? Is it possible for you to upload the “wplot.png” files from the three runs of the protein-ligand complex?

I would like to attach the wplot.png files, but as a new user, I am unable to upload files.
Instead, I am sharing a Google Drive link.

https://drive.google.com/drive/folders/18ZiJSVLaW8SaaYaQkHheTarnjTUEcNjN?usp=sharing

Could you explain what it means when you say that the work values can suddenly change due to a conformational change?

Thank you so much for your help!

Hi,

I meant similar to Figure 11a of this paper (https://pubs.acs.org/doi/full/10.1021/acs.jpcb.0c10263).

I don’t see anything like that in your plots.
I’ll try to reproduce your results and get back.

Thanks,
Sudarshan

Hi,

I was trying to reproduce your results and realised that one of the tutorial values (-10.25) was a statistical fluke: it was sampled from the tail of the distribution, as you can see from the attached plot. All your values (Soymilk) fall within the distribution.

We’ll update the tutorial to have a better representative value.

Thanks for pointing out the issue.
Sudarshan

Sudchem,
thank you so much for such a thorough response, complete with simulation analyses!
I just wanted to ask to make sure I was using pmx correctly, but you went above and beyond by running multiple simulations and even providing a beautifully detailed energy distribution plot.
It really helped make things clearer.
I’m grateful for your kind and generous help!