Different results on Intel/AMD machines

I’m getting different results from running examples/docking-protein-protein/docking-protein-protein-test.cfg

command to run

cd examples/docking-protein-protein
haddock3 docking-protein-protein-test.cfg

amd’s run1-test/7_caprieval/capri_ss.tsv

model   md5 caprieval_rank  score   irmsd   fnat    lrmsd   ilrmsd  dockq   cluster-id  cluster-ranking self.model-cluster-ranking
../6_emref/emref_1.pdb  -   1   -110.599    9.146   0.111   17.011  15.393  0.112   -   -   -
../6_emref/emref_2.pdb  -   2   -105.002    2.145   0.556   3.893   3.476   0.570   -   -   -
../6_emref/emref_3.pdb  -   3   -88.291 9.267   0.111   16.617  13.212  0.115   -   -   -
../6_emref/emref_4.pdb  -   4   -81.664 11.176  0.139   18.428  18.058  0.111   -   -   -
../6_emref/emref_5.pdb  -   5   -74.947 11.047  0.111   18.039  17.959  0.104   -   -   -

intel’s

model   md5 caprieval_rank  score   irmsd   fnat    lrmsd   ilrmsd  dockq   cluster-id  cluster-ranking self.model-cluster-ranking
../6_emref/emref_1.pdb  -   1   -110.232    1.450   0.694   2.915   2.326   0.702   -   -   -
../6_emref/emref_2.pdb  -   2   -92.448 2.680   0.500   5.157   4.130   0.490   -   -   -
../6_emref/emref_5.pdb  -   3   -92.102 10.946  0.028   19.065  19.475  0.071   -   -   -
../6_emref/emref_3.pdb  -   4   -74.786 10.115  0.028   16.993  15.180  0.083   -   -   -
../6_emref/emref_4.pdb  -   5   -72.464 3.774   0.222   6.956   5.266   0.319   -   -   -

other info:
1、both using haddock3 git commit 0dad275
2、same CNS binary (compiled on Intel machine and then copied to AMD)
3、both with Ubuntu 22.04
4、CPU:

cat /proc/cpuinfo  | grep 'name'| uniq
model name      : 11th Gen Intel(R) Core(TM) i7-11700 @ 2.50GHz
model name	: AMD EPYC 7532 32-Core Processor

5、in 0_topoaa folder, *.inp are exactly same, *.psf files are the same except for the one line telling date of output
6、I’m getting slightly different _haddock.pdb files in 0_topoaa folder as following (left is Intel)

You can not get exactly the same results when running on different hardware.
This is the nature of the computations which is chaotic. Full reproducibility is only achievable on the same hardware.

PS: To compare two hardware, better to perform a full run to see if the results are consistent (but won’t be exactly the same).

Thanks for the reply.

I tried haddocking with docking-protein-protein-full.cfg on three setups (local)

setup1: 11th Gen Intel(R) Core™ i7-11700 @ 2.50GHz
setup2: Intel(R) Core™ i9-10900 CPU @ 2.80GHz
setup3: AMD EPYC 7532 32-Core Processor

and here’s what I found:

setup1 and setup2 output exactly the same 08_caprieval/capri_ss.tsv, down to last digits:

model   md5     caprieval_rank  score   irmsd   fnat    lrmsd   ilrmsd  dockq   cluster-id      cluster-ranking self.model-cluster-ranking
../07_emref/emref_32.pdb        -       1       -133.753        2.090   0.694   3.979   3.088   0.618   -       -       -
../07_emref/emref_7.pdb -       2       -123.898        1.901   0.694   3.476   2.849   0.645   -       -       -
../07_emref/emref_20.pdb        -       3       -120.499        0.973   0.833   1.497   1.226   0.836   -       -       -
../07_emref/emref_44.pdb        -       4       -119.955        0.917   0.889   1.527   1.416   0.862   -       -       -
../07_emref/emref_78.pdb        -       5       -119.565        0.969   0.889   1.504   1.319   0.855   -       -       -
../07_emref/emref_117.pdb       -       6       -117.615        10.917  0.139   18.054  18.025  0.113   -       -       -
../07_emref/emref_26.pdb        -       7       -117.032        1.693   0.639   2.813   2.158   0.660   -       -       -
../07_emref/emref_8.pdb -       8       -116.344        2.552   0.500   4.728   3.784   0.507   -       -       -
../07_emref/emref_89.pdb        -       9       -115.889        2.445   0.444   4.823   3.066   0.491   -       -       -

meanwhile setup3 gives quite different top scores and ranking.

../07_emref/emref_8.pdb -       1       -126.115        1.949   0.722   3.876   2.855   0.641   -       -       -
../07_emref/emref_75.pdb        -       2       -120.761        1.454   0.750   2.981   2.533   0.719   -       -       -
../07_emref/emref_4.pdb -       3       -120.174        1.635   0.722   2.843   2.385   0.693   -       -       -
../07_emref/emref_1.pdb -       4       -117.396        1.371   0.806   2.257   2.011   0.762   -       -       -
../07_emref/emref_3.pdb -       5       -116.627        2.612   0.528   4.944   4.049   0.508   -       -       -
../07_emref/emref_5.pdb -       6       -114.115        3.834   0.444   7.359   5.301   0.383   -       -       -
../07_emref/emref_57.pdb        -       7       -113.359        1.625   0.639   2.678   2.271   0.670   -       -       -
../07_emref/emref_6.pdb -       8       -113.011        2.385   0.500   4.399   3.596   0.524   -       -       -
../07_emref/emref_65.pdb        -       9       -111.223        0.967   0.889   1.818   1.512   0.851   -       -       -

Thus it’s possible two machines, for example setup1/2 using Intel architectures, generate same results (which I would rather prefer), and the difference between AMD and Intel CPUs is not insignificant. Furthermore, when running jobs on a cluster, they may be distributed to different architectures so output scores and structures would change from run to run, which makes reproducing results difficult.

1 Like

Thanks for this detailed explanation, could you please check if the models
../07_emref/emref_32.pdb from setup1/2 is the same as ../07_emref/emref_8.pdb from setup3?

also @unmerged please remember that haddock3 in its current state is still very experimental and has not been tested/benchmarked and is not recomended for production. Please refer to the current production version HADDOCK2.4

Interesting

Although the scores are different, the quality of the models is quite similar.

Did you check the cluster stats as well?