HADDOCKING hetero-oligomer with crosslinking restrains

Dear HADDOCK team,

i’m trying to use HADDOCK to model a hetero-oligomer of protein 1 (chain A and B) and protein 2 (chain C and D) in a 2:2 stoichiometry. C and D are known to form a homo-dimer and each copy interacts with one copy of protein 1: A-CD-B. The structural models of the single chains are predicted by AlphaFold 2 with high confidence.
I have several sources of information i like to incorporate:
Information 1: from several high-resolution crystal structures we know that the last 3 residues of chain C interact with a binding pocket of chain A and the same is true for B interacting in a similar manner with chain D. From the crystal structure I have the exact bond length between the respective atoms and integrated them in a .tbl file I use as unambiguous restrains with a length definition of 2.70 2.70 0 for example.

Information 2: from another crystal structure we know that chain C and D form a homo-dimer so we made use of this information in the same way as we did for information 1.

Information 3: chain C and D have a linker region which is known to connect two rigid regions of the protein. I defined the six amino acids of the linker region as fully flexible.

Information 4: we have a high-confidence crosslinking MS dataset from a hetero-bifunctional crosslinker with a spacer length of 3.9 Angström. I also used the crosslinked residues in the same .tbl file as mentioned above for unambiguous restrains with a length definition of 20 18 5 for Calpha atoms of the residues.

Question 1) regarding information 1: Interestingly, in all the runs the docking of chain A and C is correct but D is not correctly located in the binding pocket of chain B even though the number of information is pretty much the same. Do you have any idea why this is the case when using unambiguous restrains with no flexibility of the length as described above?

Question 2) regarding information 3: I have noticed that the flexibility of the linker regions is not used during the docking, meaning that the input structure almost perfectly superimposes with the result model. We assume that this flexibility is even needed for docking so we expect to see some changes in this region. Do you have any idea why HADDOCK is neglecting this piece of information?

Question 3) regarding information 4: Do you think a shorter distance range for the crosslinking data could help?

Thank you a lot in advance and best regards!

Question 1) regarding information 1: Interestingly, in all the runs the docking of chain A and C is correct but D is not correctly located in the binding pocket of chain B even though the number of information is pretty much the same. Do you have any idea why this is the case when using unambiguous restrains with no flexibility of the length as described above?

Are you inputing your AB dimer as one molecule? With non-overlapping residue numbering? This would be a three body docking.
Or are you performing a four body docking?

Check carefully your restraints making sure the residue numbers and segids are correct.

Also considering all the info you have at hand, it seems that some simple superimposition and refinement might do the trick already…

Are you imposing C2 symmetry on your system?

Question 2) regarding information 3: I have noticed that the flexibility of the linker regions is not used during the docking, meaning that the input structure almost perfectly superimposes with the result model. We assume that this flexibility is even needed for docking so we expect to see some changes in this region. Do you have any idea why HADDOCK is neglecting this piece of information?

Don’t expect large conformational changes to take place.
Also if your restraints are satisfied with the current conformation, there will be no driving force to induce conformational changes.

If you do expect that the domains in your C/D proteins should change their conformation, it might be an option to try to pre-sample possible conformations and give an ensemble of conformations to HADDOCK.

Another way to model the system would be stepwise:

  1. model the A-C complex

  2. use output models from 1) to model the dimeric complex (for this to work the residue numbering of the A-C complex should be non-overlapping)

Thank you very much for your quick and detailed reply!

To continue with Question 1) Currently I do a four body docking and I do not impose C2 symmetry. Would this help?

Thank you again and best regards.

Symmetry should help indeed