Dear HADDOCK team,
i’m trying to use HADDOCK to model a hetero-oligomer of protein 1 (chain A and B) and protein 2 (chain C and D) in a 2:2 stoichiometry. C and D are known to form a homo-dimer and each copy interacts with one copy of protein 1: A-CD-B. The structural models of the single chains are predicted by AlphaFold 2 with high confidence.
I have several sources of information i like to incorporate:
Information 1: from several high-resolution crystal structures we know that the last 3 residues of chain C interact with a binding pocket of chain A and the same is true for B interacting in a similar manner with chain D. From the crystal structure I have the exact bond length between the respective atoms and integrated them in a .tbl file I use as unambiguous restrains with a length definition of 2.70 2.70 0 for example.
Information 2: from another crystal structure we know that chain C and D form a homo-dimer so we made use of this information in the same way as we did for information 1.
Information 3: chain C and D have a linker region which is known to connect two rigid regions of the protein. I defined the six amino acids of the linker region as fully flexible.
Information 4: we have a high-confidence crosslinking MS dataset from a hetero-bifunctional crosslinker with a spacer length of 3.9 Angström. I also used the crosslinked residues in the same .tbl file as mentioned above for unambiguous restrains with a length definition of 20 18 5 for Calpha atoms of the residues.
Question 1) regarding information 1: Interestingly, in all the runs the docking of chain A and C is correct but D is not correctly located in the binding pocket of chain B even though the number of information is pretty much the same. Do you have any idea why this is the case when using unambiguous restrains with no flexibility of the length as described above?
Question 2) regarding information 3: I have noticed that the flexibility of the linker regions is not used during the docking, meaning that the input structure almost perfectly superimposes with the result model. We assume that this flexibility is even needed for docking so we expect to see some changes in this region. Do you have any idea why HADDOCK is neglecting this piece of information?
Question 3) regarding information 4: Do you think a shorter distance range for the crosslinking data could help?
Thank you a lot in advance and best regards!