Combining runs and re-running analysis

Hello!

Expanding from this post (Ensemble vs separate docking runs), I wanted to combine ten HADDOCK runs and re-run the analysis (in specific the clustering part). However, I need the pairwise RMSD matrix file (complex_rmsd.disp) to run the scripts for the clustering part. From the manual I’ve seen that the rmsd.inp script is the one that creates that file, but I am not sure how to use it for a combination of ten runs.

Any help is appreciated, thanks!

There is no simple solution to that. If dealing with protein-protein docking, then I would recommend to use of fraction of common contact clustering approach (FCC).
It is much faster than RMSD-based.

For RMSD clustering you would need to calculate first the full RMSD matrix (which is what the rmsd.inp CNS script does, but not meant to be run separately).

For FCC clustering see: https://github.com/haddocking/fcc

Another possibility would be to use the new (still beta) haddock3 code and use the scoring example, giving as input the list of all PDBs from your combined runs.

See:

https://www.bonvinlab.org/haddock3/
https://github.com/haddocking/haddock3

Dear Prof. Bonvin,

I am running protein-ligand docking, that’s why I am using the RMSD as the metric for clustering. Is there a way to calculate the full RMSD matrix for the combined runs?

Another question: in the docking ensemble tutorial I found the following sentence: “We however recommend to limit the number of conformers used for docking, since the number of conformer combinations of the input molecules might explode (e.g. 10 conformers each will give 100 starting combinations and if we generate 1000 rigid body models each combination will only be sampled 10 times).” I don’t understand what you mean by “100 starting combinations”, don’t you mean that each conformer will be sampled 100 times if there are 1000 rigid body models?

Thanks!

I am running protein-ligand docking, that’s why I am using the RMSD as the metric for clustering. Is there a way to calculate the full RMSD matrix for the combined runs?

You will have to write a script yourself to do it I am afraid. One software you could use for this is ProFit

Another question: in the docking ensemble tutorial I found the following sentence: “We however recommend to limit the number of conformers used for docking, since the number of conformer combinations of the input molecules might explode (e.g. 10 conformers each will give 100 starting combinations and if we generate 1000 rigid body models each combination will only be sampled 10 times).” I don’t understand what you mean by “100 starting combinations”, don’t you mean that each conformer will be sampled 100 times if there are 1000 rigid body models?

If we have say 10 models for each molecule, you can create 100 combinations of starting conformations.
Now if you generate in the first stage 1000 models, this means that each combination of starting conformations will be sampled only 10 times.
If only one combination might lead to good results this can be problematic. It all depends on the information at hand to guide the docking.
Also in such cases you might increase the sampling to e.g. 10000 rigid body docking models.