Challenges in Reproducing ddG Estimation Results for 1STN


I am reaching out for assistance regarding an issue I encountered while attempting to reproduce the ddG estimation results for 1STN, as detailed in your paper, Accurate and Rigorous Prediction of the Changes in Protein Free Energies in a Large-Scale Mutation Scan .

Following the methodology outlined in the supporting information of your paper, I conducted a 20 ns equilibration simulation for the wildtype 1STN. After discarding the initial 2 ns of the simulation, I used the remaining trajectory to extract 100 equidistant snapshots. These snapshots were then mutated to generate the B state, followed by a further 20 ps equilibration simulation. Subsequently, I executed 50 ps alchemical transitions between the A and B states.

Enclosed are the mdp files utilized for these simulations:
md_mut.mdp (8.0 KB)
md_wt.mdp (7.9 KB)
morphA.mdp (8.0 KB)
morphB.mdp (8.0 KB)

Despite closely following the protocol, my results diverge significantly from those reported in your paper, as illustrated in the attached correlation plot:

A notable deviation in my replication attempt is the use of Gromacs version 2023.4, whereas the study utilized versions 4.6 and 5.0, which might have employed an alternative soft-core function.

I am eager to understand whether the discrepancy in Gromacs versions could be a contributing factor to the differences in our results. Additionally, I would greatly appreciate any insights or suggestions that could guide me in aligning my results more closely with those reported in your study.

Thank you for your time and consideration.

Best regards,

Hi Marc,

as you already noted, the employed softcore function might very well be the cause. We’ve especially developed the linearized softcore function to alleviate issues with the standard softcore function, especially in non-equilibrium free energy simulations.

Therefore, that would be the suggested first check.



1 Like

Hey Marc,

the protocol sounds different to that used in the publication. It is necessary to obtain equilibrated ensembles at the WT and MUT states, i.e. you need to simulate for 20 ns in the WT and in the MUT states. Then extract the snapshots from each of the ensembles and perform the transitions.


Dear Bert and Vytas,

I appreciate your prompt responses.

As you’ve mentioned, the softcore function currently in use may be contributing to the issue. Our linearized softcore function was specifically designed to address problems with the standard softcore function, particularly in non-equilibrium free energy simulations.

I will proceed with re-running the simulations using the GROMACS version available on your website. Is it correct to assume that only the CPU version is currently offered?

Your protocol seems to deviate from the one described in the publication. To obtain equilibrated ensembles for both WT and MUT states, you should perform 20 ns simulations for each state. Afterward, extract snapshots from each ensemble and carry out the transitions.

I apologize for the oversight; I may have misunderstood the information provided in the supporting documentation.

In cases involving barnase, staphylococcal nuclease, tripeptides, and protein-protein binding simulations, hybrid structures for the mutated residues were incorporated into the snapshots. Each system was then energy minimized and subjected to a 20 ps equilibrium simulation.

Despite this, I have tested an alternative protocol for several proteins, which involved running 20 ns equilibration simulations for both the WT and MUT proteins. Interestingly, the results were quite similar to those obtained with the previous protocol.

Thank you once again for your assistance,

Regarding softcore, you can use the latest gromacs version and set the mdp option: sc-function=gapsys

One more thing, in your plot you show values in kcal/mol. If I remember correctly, in the paper we reported values in kJ/mol. Could it be that there is a unit conversion issue somewhere?

Regarding softcore, you can use the latest gromacs version and set the mdp option: sc-function=gapsys

Awesome, i didn’t know that. Thanks!

One more thing, in your plot you show values in kcal/mol. If I remember correctly, in the paper we reported values in kJ/mol. Could it be that there is a unit conversion issue somewhere?

Good point, but I just used the wrong unit in the plot. It should be kJ/mol.

I gonna rerun using the sc potential and keep you updated once I have the results. Thanks for your help.


I recently re-ran a subset of the 1stn dataset using the softcore potential with function=gapsys. This led to a significant change in the results, which now fit better into the expected range. However, the correlation remains poor.

In addition, I recalculated the delta G values for the tripeptides, and I found that these values also differ substantially from the values reported on your website. For instance, using the amber99sb_ildn_star forcefield, I obtained a value of 80.12 kJ/mol for V2A, while the website lists it as 70.48 kJ/mol.

I would greatly appreciate any advice regarding potential pitfalls or suggestions on where to proceed from here to resolve these discrepancies.

Thank you for your assistance,


Are you using python3 based ‘develop’ branch of pmx? If so, it has different mutation libraries than the ones on the webserver, which are based on the ‘master’ pmx branch. The tripeptide values on the webserver will have different dG values to the those calculated with the ‘develop’ branch.

Thank you for the helpful information. I was initially using the develop branch of pmx for my simulations. However, after switching to the master branch and re-running some tripeptide simulations (using sc-function=gapsys), I noticed mixed results. For certain cases (V2A, L2A), there is good agreement with the values on the webserver, but for others (V2A, I2A), the agreement remains quite poor. This discrepancy is puzzling.

I’m now considering whether some of my mdp options might be causing these inconsistencies. Therefore, it would be great if you provided an example file that was used for the 1stn study (and maybe also for the tripeptide)? Additionally, I would like to know if the master or develop branch of pmx was utilized for the 1stn study.

Thank you very much for your assistance.

Best regards,

How large are the differences you observe? For which force field do they manifest?

Here are the mdp files: equilibrium runs, velocity equilibration before the transition, transition run. Note that these simulations were done with an old gmx4.6 version

eq.mdp (7.7 KB)
eq_20ps.mdp (7.7 KB)
eq_ti_l0.mdp (7.6 KB)

Thank you very much for providing the mdp files. However, I have some questions regarding the velocity equilibration before the transition. In my current setup, I perform 20 ns equilibration runs for both the WT and MUT states. I then extract 100 equidistant frames from each of these runs and carry out a 50 ps transition for WT->MUT and MUT->WT, respectively. Is it necessary to perform velocity equilibration before the transitions in this setup?

I used your mdp files (for equilibration and transition) with GROMACS 2023.5 and the master branch of PMX, deploying the amber99sb-star-ildn-mut forcefield. I completed four runs each for F2A and I2A, obtaining -37.05±0.24 kJ/mol and -12.51±0.26 kJ/mol, respectively. The webserver reports -31.95±0.39 kJ/mol and -9.64±0.33 kJ/mol. Although the deviations are not overly large, they are still significant and do not appear to be outliers, as I consistently obtained these values over multiple separate runs. Interestingly, for the other two tripeptides I tested (V2A and L2A), I was able to perfectly reproduce the results from the webserver.

  1. Velocity equilibration not necessary if you have extracted them into .gro files. The setup used in the paper simulated WT and Mut states without the hybrid topology. Hybrid structures were created on the extracted snapshots, so velocity equilibration was needed.

  2. Difficult to say: gmx version change might have an effect, as there have been many modifications to the free energy code over the years. To completely eliminate any possible discrepancy between pmx versions, could you try running with the topology and structure for F2A provided by the webserver?

Thank you for the clarification. I plan to conduct another simulation using the structure and topology generated by the webserver, utilizing the GMX2023.5 version. Moreover, I will also perform a simulation with the GROMACS 4.6 version available on your website. If there are any differences between these two simulations, it would indicate that the discrepancy stems from the GROMACS version being used.

Here are my results: I was running F2A tripeptide only. When using the GROMACS 4.6 version available on your website I obtain dG=-35.09 kJ/mol, and with the GROMACS 2023.5 I obtain dG=-36.02 kJ/mol. In both cases i used the structure and topology file from the webserver and the mdp files that you have provided. For the GROMACS 2023 run I adapted the mdp file to use the gapsys sc-function and set the electrostatics/VDW parameters to

coulombtype              = PME
rcoulomb-switch          = 0
rcoulomb                 = 1.2
vdw-type                 = Cut-off
vdw-modifier             = Potential-switch
rvdw-switch              = 1.0 
rvdw                     = 1.2

I somehow consistently underestimate dG for F2A compared to the value reported on your website. Can that still just be a statistical outlier?


Hi Marc,

The tripeptide values reported in the webserver seem to be well-reproducible using the structure and topology files (and mdp files from the Download section) from the webserver. The modifications I made to mdp files are: cutoff-scheme changed from group to verlet and rdvw set to 1.2nm. Here are my results of 5 independent replicas for each tripeptide, each simulated for 6 ns and 80 frames extracted from the last 4 ns, followed by 100ps short transitions. The gromacs versions used are 5.1, 2018.6 and 2023.4.

The small difference between the new values and the reported ones could be because of the small difference in the mdp options, a slightly different setup and the number of independent replica (just one for the value reported in webserver).

I would suggest to use the mdp files from “Download” section and multiple independent replicas to test the reproducibility.

I hope it helps.