Running clustfcc across multiple Haddock runs

Dear Haddock team,

I am trying to predict the binding site of a binder to its target (the binder is a protein engineered to bind to a specific target) by performing protein-protein docking with Haddock3 (standard Haddock2.X workflow). As I have information on which residues are involved in the docking for both the binder and the target, I performed the docking with restraint files. I performed the docking for multiple different binders which have different binding affinities to the target. All binders have the same backbone scaffold with a mutated binding interface (around 10-15 residues are different between each binder), so I used the same restraint files for all runs to enable comparisons across binders. For each binder, I performed three replicates of the docking experiment with different random seeds.

My aim is now to find out if the different binders yield similar docking poses. For this, my strategy is the following:

  1. First, I want to verify that the randomness of docking does not majorly influence my results. For this, I want to find out for each binder whether the triplicates yield similar docking poses.

  2. Second, I want to verify across binders whether I get similar docking poses or not.

I both cases, I would like to perform a clustering by fraction of common contacts on the docking results of multiple docking runs (e. g. clustfcc of the poses of all three replicates for one binder). I cannot seem to make clustfcc run without running a docking workflow in front of it.

Does this mean that my strategy is not optimal and I should use other tools to find similar docking poses across triplicates and binders? Or is there a way of running clustfcc standalone?

Thank you very much for your help!

All the best,

Pauline

Hi Pauline

One way of doing that is to generate an ensemble that contains all your conformations from the different runs and then run a scoring workflow, e.g.:


# ====================================================================
# Scoring example

# directory in which the scoring will be done
run_dir = “run1-score-cluster"

# execution mode
ncores = 40
mode = "local"

# ensemble of different complexes to be scored
molecules = [“my-ensemble.pdb"]

# ====================================================================
# Parameters for each stage are defined below

[topoaa]

[emscoring]

[clustfcc]

[seletopclusts]

[caprieval]

# ====================================================================

And you can create the ensemble by using pdb_mkensemble

You can look at the end in the traceback directory to trace the models to their original input PDB.

1 Like

Hello,

Thank you very much, that worked very well!

Glad to hear that!