Combining runs and re-running analysis

mls333 · April 19, 2023, 9:04pm

Hello!

Expanding from this post (Ensemble vs separate docking runs), I wanted to combine ten HADDOCK runs and re-run the analysis (in specific the clustering part). However, I need the pairwise RMSD matrix file (complex_rmsd.disp) to run the scripts for the clustering part. From the manual I’ve seen that the rmsd.inp script is the one that creates that file, but I am not sure how to use it for a combination of ten runs.

Any help is appreciated, thanks!

amjjbonvin · April 20, 2023, 12:44pm

There is no simple solution to that. If dealing with protein-protein docking, then I would recommend to use of fraction of common contact clustering approach (FCC).
It is much faster than RMSD-based.

For RMSD clustering you would need to calculate first the full RMSD matrix (which is what the rmsd.inp CNS script does, but not meant to be run separately).

For FCC clustering see: https://github.com/haddocking/fcc

Another possibility would be to use the new (still beta) haddock3 code and use the scoring example, giving as input the list of all PDBs from your combined runs.

See:

https://www.bonvinlab.org/haddock3/
https://github.com/haddocking/haddock3

mls333 · April 20, 2023, 1:25pm

Dear Prof. Bonvin,

I am running protein-ligand docking, that’s why I am using the RMSD as the metric for clustering. Is there a way to calculate the full RMSD matrix for the combined runs?

Another question: in the docking ensemble tutorial I found the following sentence: “We however recommend to limit the number of conformers used for docking, since the number of conformer combinations of the input molecules might explode (e.g. 10 conformers each will give 100 starting combinations and if we generate 1000 rigid body models each combination will only be sampled 10 times).” I don’t understand what you mean by “100 starting combinations”, don’t you mean that each conformer will be sampled 100 times if there are 1000 rigid body models?

Thanks!

amjjbonvin · April 21, 2023, 8:29am

I am running protein-ligand docking, that’s why I am using the RMSD as the metric for clustering. Is there a way to calculate the full RMSD matrix for the combined runs?

You will have to write a script yourself to do it I am afraid. One software you could use for this is ProFit

Another question: in the docking ensemble tutorial I found the following sentence: “We however recommend to limit the number of conformers used for docking, since the number of conformer combinations of the input molecules might explode (e.g. 10 conformers each will give 100 starting combinations and if we generate 1000 rigid body models each combination will only be sampled 10 times).” I don’t understand what you mean by “100 starting combinations”, don’t you mean that each conformer will be sampled 100 times if there are 1000 rigid body models?

If we have say 10 models for each molecule, you can create 100 combinations of starting conformations.
Now if you generate in the first stage 1000 models, this means that each combination of starting conformations will be sampled only 10 times.
If only one combination might lead to good results this can be problematic. It all depends on the information at hand to guide the docking.
Also in such cases you might increase the sampling to e.g. 10000 rigid body docking models.

Topic		Replies	Views
Ensemble vs separate docking runs HADDOCK	2	1030	June 21, 2018
Error in HADDOCK3 Clustering (RMSD-based [clustrmsd]) during Protein-Protein Docking HADDOCK	5	176	February 26, 2024
Protein- peptide docking HADDOCK	14	3000	November 26, 2023
Create match PDB file for RMSD calculation HADDOCK	1	235	February 21, 2022
High RMSD from HADDOCK calculation HADDOCK	3	389	March 14, 2023

Combining runs and re-running analysis

Related topics