Protein-ligand docking without binding knowledge

The HADDOCK category is meant to discuss any HADDOCK-related issue. For general information about HADDOCK refer to HADDOCK – Bonvin Lab

Hi everyone,

I’m trying to dock a small ligand to a protein. Is it possible for me to select a specific area for HADDOCK to identify possible binding sites?

In addition, as there is no information as to how the protein and ligand binds, may I get some advise in preparing the HADDOCK restraints TBL files?

Thank you.

Hi there

I suggest you check those two tutorials:

https://www.bonvinlab.org/education/HADDOCK24/shape-small-molecule/

https://www.bonvinlab.org/education/HADDOCK24/HADDOCK24-binding-sites/

The first one is the way to go if you can identify a template.
See our recent publication: https://doi.org/10.1021/acs.jcim.1c00796

The second one will explain you how to define a possible binding site on the protein in the second part of the tutorial:

https://www.bonvinlab.org/education/HADDOCK24/HADDOCK24-binding-sites/#setting-up-a-new-docking-run-targeting-the-identified-binding-pocket

Hi @amjjbonvin

Thank you so much! =)

Hi @amjjbonvin,

I am trying to specify an area for HADDOCK, however, I’m not sure the shape restrained model is suitable as currently, there is no inform of how the ligand would bind. I would like to specify the area on the protein that the ligand can potential bind to. How should I go about doing that?

Thank you for your help! =)

Check the second tutorial - and the second part of that one

https://www.bonvinlab.org/education/HADDOCK24/HADDOCK24-binding-sites/#setting-up-a-new-docking-run-targeting-the-identified-binding-pocket

You will have to generate first the restraints files.

Hi @amjjbonvin,

Thank you for getting back to me. Stating all the residues in the area as active residues would affect the scoring of the binding site, no? In my case, I would like to specify an area (which contains 285 resides) to be the ‘searched area’. Would setting all 285 residues not affecting the scoring of the binding site?

Thank you.

With regards,
Eunice

That’s a rather large area - make sure to only specify the solvent accessible ones.
And define those as passive and the ligand as active. In that way the ligand has the freedom to explore the entire region.

You will also have to increase the sampling, e.g. to 10000 for it0 and 400 for it1 and water

Hi @amjjbonvin

I have submitted and gotten results for my calculations. For some reason, the ligand is interacting with residues that are not specified as active. May I know why that’s the case?

Thank you.

There is energy term that penalises contacts with non active residues.

This is normal behaviour.

It is a balance between the force field and the restraints defined.
How did you define the active/passive?

Hi @amjjbonvin,

I defined the active residues in the “Molecule 1 Active residues (directly involved in the interaction)” section.

HADDOCK will define restraints from those to the ligand (and by default 50% will be randomly removed). But again this does not prevent the ligand to interact with other residues. The question is does it contact some of the active residues you defined?

Hi @amjjbonvin ,

From what I see, 4 of the top 5 clusters do not interact with the active residues. Anything I should alter in terms of that?

That seems strange - could be an issue in the way you setup your run.

Can you share (or email me) a link to your result page?

Hi @amjjbonvin

This is the link to the result page (https://wenmr.science.uu.nl/haddock2.4/run/5776824709/94068-auxin-pin1)

Thank you.

With regards,
Eunice

In this run you did not define any active residues as I can see from the json parameter file.

You used instead the random AIR option ( “ranair”: true, in the json file).
This can not be combined with defining active residues.

You must have followed the wrong section of the tutorial when setting up your run. In case you have a binding site start at the following section:

And you will need to generate the restraints file to give to HADDOCK as explained in the tutorial.

Also another comment: You seem to be using an AlphaFold2 model which contains a lot of “spaghetti” regions, i.e. regions for no structural predictions could be made. Best to remove those for docking - it is waste of computational resources to include those.

Hi @amjjbonvin,

That’s so weird! And noted, I will remove the spaghetti portion.

Thank you!

Hi @amjjbonvin ,

One more question, to save computational resources, would it be alright to just use the area of interest?

Indeed a good approach, but I would try to keep at least a domain structures. i.e. not start cutting halfway in a domain.

To give one example, if you are interested in looking at Sars-Cov2 spike protein - ACE2 interaction, there is in principle no need to use the full spike trimer. The RBD is enough (provided you have some knowledge of the binding site). The situation would be different if you were to do ab-initio docking.