Does it make sense to use negative controls in ab-initio simulations with HADDOCK3?

Hi everyone,
I’m conducting protein-protein docking using HADDOCK3 on 13,907 protein pairs in an ab initio mode due to lack of experimental information about my proteins. I have started to obtain results and now need to analyze my predictions, but I have some questions:
Do you have any negative controls with proteins that are known not to interact, and HADDOCK3 predicted them accurately?
I anticipate having numerous results with poor predictions to indicate that these pairs cannot interact and form a complex. I expected to have some positive scores and energies to suggest that the proteins do not interact, but currently, all my scores are negative. All the complexes exhibit negative scores and energies.
I obtained some results with only one cluster and a few models within it, particularly when one of the proteins is disordered. Is it reasonable to conclude that if the prediction yields only one cluster, the interaction is likely not occurring?

Dear YFeri,

Thanks for your interest in using haddock3 for your research.

Do you have any negative controls with proteins that are known not to interact, and HADDOCK3 predicted them accurately?

No, we do not have any data related to this. It is very difficult to obtain a reference validating an accurate prediction if we know that they are not interacting. If not interacting, no structure are available, no comparisons feasible.

I anticipate having numerous results with poor predictions to indicate that these pairs cannot interact and form a complex. I expected to have some positive scores and energies to suggest that the proteins do not interact, but currently, all my scores are negative. All the complexes exhibit negative scores and energies.

Haddock is always trying to minimize the score. The only way to obtain positive score is by getting unsatisfied ambiguous interaction restraints, van der waals clashes or electrostatics interactions of the same (partial)charge. To be able to distinguish between binders and non-binders, you could only rely on the differences obtained between two scores.

I obtained some results with only one cluster and a few models within it, particularly when one of the proteins is disordered. Is it reasonable to conclude that if the prediction yields only one cluster, the interaction is likely not occurring?

If you are using (spaghetti) AlphaFold models as initial conformation, it will be very difficult to obtain a binding mode that will occur multiple time, especially in ab-initio simulation without ambiguous interaction restraints. I would suggest to remove disordered parts (pLDDT < 60) so you would have a change to get the domain-domain interaction without struggling with the VdW clashes between disordered regions. Also, keep in mind that those disordered regions are flexible, and you probably only have a snapshot representing one of the available conformations, which in the case of disordered regions have an unlikely chance to be the one that will bind your partner.

With the hope that this answer helped you.

Adding to that, it might be good to check our related papers. The is a poor correlation between docking scores and binding affinity…

Hence trying to predict interactions by docking is a hard problem.

@VGPReys @amjjbonvin
Thank you for your responses and your help.