Challenges in Reproducing ddG Estimation Results for 1STN


I am reaching out for assistance regarding an issue I encountered while attempting to reproduce the ddG estimation results for 1STN, as detailed in your paper, Accurate and Rigorous Prediction of the Changes in Protein Free Energies in a Large-Scale Mutation Scan .

Following the methodology outlined in the supporting information of your paper, I conducted a 20 ns equilibration simulation for the wildtype 1STN. After discarding the initial 2 ns of the simulation, I used the remaining trajectory to extract 100 equidistant snapshots. These snapshots were then mutated to generate the B state, followed by a further 20 ps equilibration simulation. Subsequently, I executed 50 ps alchemical transitions between the A and B states.

Enclosed are the mdp files utilized for these simulations:
md_mut.mdp (8.0 KB)
md_wt.mdp (7.9 KB)
morphA.mdp (8.0 KB)
morphB.mdp (8.0 KB)

Despite closely following the protocol, my results diverge significantly from those reported in your paper, as illustrated in the attached correlation plot:

A notable deviation in my replication attempt is the use of Gromacs version 2023.4, whereas the study utilized versions 4.6 and 5.0, which might have employed an alternative soft-core function.

I am eager to understand whether the discrepancy in Gromacs versions could be a contributing factor to the differences in our results. Additionally, I would greatly appreciate any insights or suggestions that could guide me in aligning my results more closely with those reported in your study.

Thank you for your time and consideration.

Best regards,

Hi Marc,

as you already noted, the employed softcore function might very well be the cause. We’ve especially developed the linearized softcore function to alleviate issues with the standard softcore function, especially in non-equilibrium free energy simulations.

Therefore, that would be the suggested first check.



1 Like

Hey Marc,

the protocol sounds different to that used in the publication. It is necessary to obtain equilibrated ensembles at the WT and MUT states, i.e. you need to simulate for 20 ns in the WT and in the MUT states. Then extract the snapshots from each of the ensembles and perform the transitions.


Dear Bert and Vytas,

I appreciate your prompt responses.

As you’ve mentioned, the softcore function currently in use may be contributing to the issue. Our linearized softcore function was specifically designed to address problems with the standard softcore function, particularly in non-equilibrium free energy simulations.

I will proceed with re-running the simulations using the GROMACS version available on your website. Is it correct to assume that only the CPU version is currently offered?

Your protocol seems to deviate from the one described in the publication. To obtain equilibrated ensembles for both WT and MUT states, you should perform 20 ns simulations for each state. Afterward, extract snapshots from each ensemble and carry out the transitions.

I apologize for the oversight; I may have misunderstood the information provided in the supporting documentation.

In cases involving barnase, staphylococcal nuclease, tripeptides, and protein-protein binding simulations, hybrid structures for the mutated residues were incorporated into the snapshots. Each system was then energy minimized and subjected to a 20 ps equilibrium simulation.

Despite this, I have tested an alternative protocol for several proteins, which involved running 20 ns equilibration simulations for both the WT and MUT proteins. Interestingly, the results were quite similar to those obtained with the previous protocol.

Thank you once again for your assistance,

Regarding softcore, you can use the latest gromacs version and set the mdp option: sc-function=gapsys

One more thing, in your plot you show values in kcal/mol. If I remember correctly, in the paper we reported values in kJ/mol. Could it be that there is a unit conversion issue somewhere?

Regarding softcore, you can use the latest gromacs version and set the mdp option: sc-function=gapsys

Awesome, i didn’t know that. Thanks!

One more thing, in your plot you show values in kcal/mol. If I remember correctly, in the paper we reported values in kJ/mol. Could it be that there is a unit conversion issue somewhere?

Good point, but I just used the wrong unit in the plot. It should be kJ/mol.

I gonna rerun using the sc potential and keep you updated once I have the results. Thanks for your help.


I recently re-ran a subset of the 1stn dataset using the softcore potential with function=gapsys. This led to a significant change in the results, which now fit better into the expected range. However, the correlation remains poor.

In addition, I recalculated the delta G values for the tripeptides, and I found that these values also differ substantially from the values reported on your website. For instance, using the amber99sb_ildn_star forcefield, I obtained a value of 80.12 kJ/mol for V2A, while the website lists it as 70.48 kJ/mol.

I would greatly appreciate any advice regarding potential pitfalls or suggestions on where to proceed from here to resolve these discrepancies.

Thank you for your assistance,


Are you using python3 based ‘develop’ branch of pmx? If so, it has different mutation libraries than the ones on the webserver, which are based on the ‘master’ pmx branch. The tripeptide values on the webserver will have different dG values to the those calculated with the ‘develop’ branch.

Thank you for the helpful information. I was initially using the develop branch of pmx for my simulations. However, after switching to the master branch and re-running some tripeptide simulations (using sc-function=gapsys), I noticed mixed results. For certain cases (V2A, L2A), there is good agreement with the values on the webserver, but for others (V2A, I2A), the agreement remains quite poor. This discrepancy is puzzling.

I’m now considering whether some of my mdp options might be causing these inconsistencies. Therefore, it would be great if you provided an example file that was used for the 1stn study (and maybe also for the tripeptide)? Additionally, I would like to know if the master or develop branch of pmx was utilized for the 1stn study.

Thank you very much for your assistance.

Best regards,

How large are the differences you observe? For which force field do they manifest?

Here are the mdp files: equilibrium runs, velocity equilibration before the transition, transition run. Note that these simulations were done with an old gmx4.6 version

eq.mdp (7.7 KB)
eq_20ps.mdp (7.7 KB)
eq_ti_l0.mdp (7.6 KB)

Thank you very much for providing the mdp files. However, I have some questions regarding the velocity equilibration before the transition. In my current setup, I perform 20 ns equilibration runs for both the WT and MUT states. I then extract 100 equidistant frames from each of these runs and carry out a 50 ps transition for WT->MUT and MUT->WT, respectively. Is it necessary to perform velocity equilibration before the transitions in this setup?

I used your mdp files (for equilibration and transition) with GROMACS 2023.5 and the master branch of PMX, deploying the amber99sb-star-ildn-mut forcefield. I completed four runs each for F2A and I2A, obtaining -37.05±0.24 kJ/mol and -12.51±0.26 kJ/mol, respectively. The webserver reports -31.95±0.39 kJ/mol and -9.64±0.33 kJ/mol. Although the deviations are not overly large, they are still significant and do not appear to be outliers, as I consistently obtained these values over multiple separate runs. Interestingly, for the other two tripeptides I tested (V2A and L2A), I was able to perfectly reproduce the results from the webserver.

  1. Velocity equilibration not necessary if you have extracted them into .gro files. The setup used in the paper simulated WT and Mut states without the hybrid topology. Hybrid structures were created on the extracted snapshots, so velocity equilibration was needed.

  2. Difficult to say: gmx version change might have an effect, as there have been many modifications to the free energy code over the years. To completely eliminate any possible discrepancy between pmx versions, could you try running with the topology and structure for F2A provided by the webserver?

Thank you for the clarification. I plan to conduct another simulation using the structure and topology generated by the webserver, utilizing the GMX2023.5 version. Moreover, I will also perform a simulation with the GROMACS 4.6 version available on your website. If there are any differences between these two simulations, it would indicate that the discrepancy stems from the GROMACS version being used.

Here are my results: I was running F2A tripeptide only. When using the GROMACS 4.6 version available on your website I obtain dG=-35.09 kJ/mol, and with the GROMACS 2023.5 I obtain dG=-36.02 kJ/mol. In both cases i used the structure and topology file from the webserver and the mdp files that you have provided. For the GROMACS 2023 run I adapted the mdp file to use the gapsys sc-function and set the electrostatics/VDW parameters to

coulombtype              = PME
rcoulomb-switch          = 0
rcoulomb                 = 1.2
vdw-type                 = Cut-off
vdw-modifier             = Potential-switch
rvdw-switch              = 1.0 
rvdw                     = 1.2

I somehow consistently underestimate dG for F2A compared to the value reported on your website. Can that still just be a statistical outlier?


Hi Marc,

The tripeptide values reported in the webserver seem to be well-reproducible using the structure and topology files (and mdp files from the Download section) from the webserver. The modifications I made to mdp files are: cutoff-scheme changed from group to verlet and rdvw set to 1.2nm. Here are my results of 5 independent replicas for each tripeptide, each simulated for 6 ns and 80 frames extracted from the last 4 ns, followed by 100ps short transitions. The gromacs versions used are 5.1, 2018.6 and 2023.4.

The small difference between the new values and the reported ones could be because of the small difference in the mdp options, a slightly different setup and the number of independent replica (just one for the value reported in webserver).

I would suggest to use the mdp files from “Download” section and multiple independent replicas to test the reproducibility.

I hope it helps.


Dear Sudarshan,

Thank you very much for your response. I truly appreciate the effort you’ve put into providing a comprehensive answer. However, the situation remains puzzling to me. I reran the F2A tripeptide simulation using the mdp files from the website, incorporating the modifications you suggested. I also used the structure and topology files from the webserver, along with GROMACS version 2023.5. After four independent runs, I obtained a dG value of -35.15 ± 0.74 kJ/mol.

Considering your results, it seems we can eliminate the GROMACS version, mdp files, and structure/topology files as potential sources of discrepancy. The only remaining factor is the specific GROMACS commands used. Could you please share the script you used to run the simulations?

I have attached the bash script I employed for the A2B transition (the B2A script is equivalent). Is there something I might be doing incorrectly?

Once again, thank you for your assistance and effort in this matter.



mkdir -p stateA
cp ./stateA/
cp hybrid.itp ./stateA/hybrid.itp
cp hybrid.pdb ./stateA/hybrid.pdb
cd stateA

# correct the path to the forcefield
sed -i 's/#include "amber99sb-star-ildn-mut.ff\/forcefield.itp"/#include "\/app\/GMX\/alchemical_trans\/forcefields\/amber99sb-star-ildn-mut.ff\/forcefield.itp"/g'
sed -i 's/#include "amber99sb-star-ildn-mut.ff\/tip3p.itp"/#include "\/app\/GMX\/alchemical_trans\/forcefields\/amber99sb-star-ildn-mut.ff\/tip3p.itp"/g'
sed -i 's/#include "amber99sb-star-ildn-mut.ff\/ions.itp"/#include "\/app\/GMX\/alchemical_trans\/forcefields\/amber99sb-star-ildn-mut.ff\/ions.itp"/g'

# make box and solvate
gmx editconf -f hybrid.pdb -o box.pdb -bt dodecahedron -d 0.9
gmx solvate -cp box -cs spc216 -o solvated.pdb -p hybrid

# add ions to neutralize system
gmx grompp -f ../mdpA/em.mdp -c solvated.pdb -p -o ion.tpr -maxwarn 2 -r solvated.pdb
#echo -e "SOL" | gmx genion -s ion.tpr -o ion.pdb -p -pname NA -nname CL -conc 0.15 -neutral
echo -e "SOL" | gmx genion -s ion.tpr -o ion.pdb -p -pname NA -nname CL -neutral

# run EM
gmx grompp -f ../mdpA/em.mdp -c ion.pdb -p -o em.tpr -r ion.pdb -maxwarn 1
gmx mdrun -v -deffnm em -nt $NT -gpu_id $GPU_ID

# recenter protein in the box
echo 0 | gmx trjconv -s em.tpr -f em.gro -o em.pdb -ur compact -pbc mol

# run MD
gmx grompp -f ../mdpA/md.mdp -c em.pdb -p -o md.tpr -maxwarn 2
gmx mdrun -v -deffnm md -nt $NT -gpu_id $GPU_ID

# Extract Frames

gmx check -f "md.xtc" &> log.txt

# make proper linebreaks in log file
sed 's/\r/\n/g' log.txt > log_unix.txt 

total_frames=$(awk '/Last frame/ {print $3}' log_unix.txt)  
total_time=$(awk '/Last frame/ {print $5}' log_unix.txt)  
total_time=$(echo "$total_time" | cut -d'.' -f1)
frames_to_skip=$(echo "0.2 * $total_frames" | bc | awk '{print int($1+0.5)}')
time_to_skip=$(echo "0.2 * $total_time" | bc | awk '{print int($1+0.5)}')

# Calculate the interval for extracting 100 frames (after skipping the first 20%)
remaining_frames=$(($total_frames - $frames_to_skip))
interval=$(echo "$remaining_frames / 100" | bc | awk '{print int($1+0.5)}')

mkdir -p morphes

# Extract the frames using gmx trjconv
echo "0" | gmx trjconv -s md.tpr -f md.xtc -o morphes/frame.gro -ur compact -sep -pbc mol -skip $interval -b $time_to_skip

# Run Transition Simulations

# Create .tpr files and run transformations
for i in {0..99}
    mkdir -p morphes/frame$num
    cd morphes/frame$num
    mv ../frame$i.gro frame$num.gro
    gmx grompp -p ../../ -c frame$num.gro -f ../../../mdpA/morph.mdp -v -maxwarn 2 -o frame$num.tpr
    gmx mdrun -deffnm frame$num -gpu_id $GPU_ID -nt $NT -v -dhdl dgdl.xvg &> frame$num.log
    rm *#
    cd ../..

Hi Marc,

A few differences between your protocol and mine, that might have caused the difference, are

  • I used ‘gmx editconf -d 1.5’ (instead of 0.9).
  • Also, Joung & Cheatham ions (NaJ and ClJ) were added in my case. I see you’ve used the default Aqvist (NA and CL) ions.

I also did a short 10ps nvt equilibration after em and before the final npt md run. It’s unlikely to cause the difference given a reasonable npt equilbration.

I’ve attached the scripts ( and and mdp files.

Since I used inputs from the webserver, not all the functions in “” script are relevant (like “hybrid_structure_topology”). I know you’ll be able to figure that out.

I hope this helps.


ti_l1.mdp (7.7 KB)
ti_l0.mdp (7.7 KB)
eq_l1.mdp (7.7 KB)
eq_l0.mdp (7.7 KB)

eq_nvt_l1.mdp (7.7 KB)
eq_nvt_l0.mdp (7.7 KB)
em_l1.mdp (7.7 KB)
em_l0.mdp (7.7 KB) (29.0 KB) (1.9 KB)

Dear Sudarshan,

Thank you for providing the files and your insights. I have already conducted new simulations using the Joung & Cheatham ions and a box distance of 1.5, but the results remain unchanged. I plan to examine your script and attempt to run it on my machine, which will hopefully yield the necessary information to address this issue.

In the meantime, I was curious about the location of the unmutated tripeptide PDB files, similar to the ones stored in the tripepPath ‘…/tripep/amber99sb-star-ildn-mut’. I have been using a mutated tripeptide PDB structure, which I manually “unmutated” and subsequently processed through the webserver. I suspect this approach might be contributing to the discrepancies in my results. Could you please guide me to the original unmutated tripeptide PDB files?

Again, thanks for your help.
All the best,

Hi Marc,

I wonder why you need to manually “unmutate” and process through pmx webserver again.

Already preprocessed mutated tripeptide files (pdb and top) can be downloaded from the webserver directly (, as tripep_X2Y.tar.gz.

Since I used these downloaded files (tripep_X2Y.tar.gz), the “tripepPath” became irrelevant. Maybe, you could try the same and see.

However, I agree, “unmutating and webserver processing” approach should not lead to a large deviation from the reported values.


Dear Sudarshan,

Initially, I manually unmutated the tripeptide and processed it through the webserver to ensure the correct version of PMX was used for the mutant. In hindsight, I realize that this was never an issue.

Fortunately, I have identified the root cause of the issue! Stupid me was using the .xtc file of the md run to extract frames for the transition from… I was not aware that the .xtc fie does not store the velocities. I reran with the .trr file and could finally reproduce the dG for the F2A tripeptide :partying_face: Your script proved invaluable, as I noticed the discrepancy when comparing your protocol to mine.

With this issue resolved, I will continue my efforts to reproduce the ddG for the 1stn mutations, confident that the problem has been addressed.

Thank you for your assistance!