Hdock3 scoring for large-scale complex models

dongl · April 11, 2025, 5:06am

I would like to perform large-scale scoring of my protein models using HADDOCK3. I have approximately 10,000 models that I need to evaluate. What would be the best strategy or recommended approach to handle this scale efficiently

VGPReys · April 11, 2025, 6:46am

Dear dongl,

Haddock3 is limited to 20 input files, but unlimited in the number of conformations in an input ensemble.

To process 10000 models, I would suggest to merge them together in a single file using pdb_mkensemble.
Then, you can process them using a standard scoring workflow:

run_dir = "big_scoring_run"
molecules = "10000_ensemble.pdb"

# Generation of topologies
[topoaa]

# A energy minimisation step followed by a scoring using the HADDOCK scoring function
[emscoring]

# Clustering by Fraction of common contacts
[clustfcc]
clust_cutoff = 0.9 # Group together models having >= 90% similar contacts
min_population = 1 # This parameter allows to keep all models even the ones that are not clustered (singlotons)

# Grouping models by clusters
[seletopclusts]
top_cluster = 10000 # in case they are all different
top_models = 10000 # in case they are all fall in the same cluster

# A final analysis step to generate the plots
[caprieval]

For reference, please see what we did for the scoring challenge in CAPRI rounds using haddock3 10.1002/prot.26789.

Topic		Replies	Views
Max_Molecule_Size Error HADDOCK	5	199	February 12, 2024
Ensemble runs vs. multiple runs on single conformations HADDOCK	4	140	January 19, 2024
Scoring part of HADDOCK to evaluate already docking examples HADDOCK	2	226	February 12, 2024
Too many None for scoring HADDOCK	5	114	February 16, 2024
haddock3：How to not save the process file HADDOCK	3	33	May 21, 2025

Hdock3 scoring for large-scale complex models

Related topics