Protein- peptide docking

Dear Haddock users

I’m using the restraint-driven docking approach to predict the manner in which a 40-residue peptide binds a protein. As in all calculations HADDOCK clustered only 50 structures in 10 cluster(s), I have increased the number of models sampled during calculation , 5000/400/400 as suggested by Prof. Bonvin.

My question is , How can I also change the number of analysed structures ? In the dyr /structures/it1/water/ I obtain indeed 400 structures but in the dyr “analysis” only 200 resulted to be analysed

thanks
Francesca

At the guru interface, you specify the number of models to analyse in the analysis parameter menu

Thanks

I would like also ask something about the clustering.
I have analysed manually the resulted clusters and I have also tried to make the clustering of the solutions with “cluster_struc” program by myself, with and without the “full linkage” option. In all cases I have some strange results but I do not understand my error.
Once I analysed such clusters by chimera and overlay the pdb files contained within the same cluster I realize that such pdbs have different relative orientation , while it could be that two more similar complex structures have been put by the program in two different clusters.
I 'm sure that I’m doing something wrong in clustering but I do not what!

thanks Francesca Cantini

Dear Francesca

First of all check which clustering option was selected in the server. The default might well be FCC, which means you now have to cluster with a different command and cutoff:

	$HADDOCKTOOLS/cluster_fcc.py 

Usage: cluster_fcc.py <matrix file> <threshold [float]> [options]

Options:
  -h, --help            show this help message and exit
  -o OUTPUT_HANDLE, --output=OUTPUT_HANDLE
                        Output File [STDOUT]
  -c CLUS_SIZE, --cluster-size=CLUS_SIZE
                        Minimum number of elements in a cluster [4]
  -s STRICTNESS, --strictness=STRICTNESS
                        Multiplier for cutoff for M->R inclusion threshold.
                        [0.75 or effective cutoff of 0.5625]

The cutoff should be between 0 and 1, the higher the value the more stringent the requirement for similar values (default is 0.75)

thanks , but in $HADDOCKTOOLS dyr I have only cluster_struc program not fcc i suppose because I’m HADDOCK 2.1 instead of 2.2

Which algorithm do you suggest to use a peptide-protein complex? I was thinking that the RMSD-based clustering was better in my case than FCC method

Are you using the local version of HADDOCK? Did you use the server or not? And which version?
If you use the 2.2 version of the server you will have to use also the 2.2. local version of HADDOCK to perform a manual cluster analysis

OK Now I’m using always the 2.2. version of HADDOCK both on the server and local for manual cluster analysis.
Which algorithm do you suggest to use for a peptide-protein complex? Perhaps the RMSD-based clustering is better in this case than FCC method?

I obtain always a high number of clusters. Reading he literature it seems to me that this is usual in the case of protein-peptide docking, but i do not have experience.
Moreover the cluster that contains the complex model that better fulfills experimental data ( like residues mutation) does not follow the CAPRI criteria ; docking model with i-RMSD or l-RMSD below 1A can be considered as a high accuracy predition or medium quality prediction will have a i-RMSD below 2A and/or l-RMSD below 5A. Such cluster indeed never contains the model with lower haddock score.

Hi Francesca

For smaller molecules, indeed RMSD might be a better option for clustering. But do reduce the cutoff (e.g. 5 or even 2.5A instead of 7.5 default)

As for your CAPRI criteria, do you mean you know the answer and the best HADDOCK score cluster is not the closest to the real structure?

If you do not know the answer, then there is no way you can use a RMSD criteria to select your cluster.

1 Like

I do not know the answer but I know same key residues and i know for example that the peptide and one helix of the protein interaction with the N-term -Nterm and C-term-Cterm ,

Then you can not use any RMSD criteria to decide which cluster is better.
Simply analyse/visualize the top-ranked clusters and see how they fit/explain your experimental data

Thanks for your reply

Dear all

i would like to analyze the fraction of intermolecular contacts within each clusters, not only hydrophobic contacts. Perhaps it is already done by the program ? In the analysis dyr there is a file named nbcontacts.disp , this file contains all intermolecular hydrophobic contacts ,i guess, Is it possible analyse the contacts within each clusters?

thanks

Hi Francesca

The server returns two relevant files for your analysis:

To obtain the same analysis per cluster, you would have to run the analysis locally after downloading the run. This means editing the run.cns file to define the correct directories and cns executable. Follow then the instructions from:

http://www.bonvinlab.org/software/haddock2.2/analysis/#reanal

You can also use the contact or contact-chainID to list all contacts within a given distance cutoff for a given model, e.g.:

$HADDOCKTOOLS/contact cluster1_1.pdb 3.9

Hi, @amjjbonvin

I am using the server and I am trying to find the number of hydrogen bonds. and I have downloaded the complete run and I can see the “ana_hbonds.csh” and “count_hbonds.awk” files in the tools folder. The page you linked in this comment to give stats on the hbonds (Bonvin Lab) says to run these scripts manually by copying these files from the tools directory into the analysis directory, however I do not see the analysis directory. Additionally, should I be copying and " ./ana_hbonds.csh hbonds.disp" (although I don’t have the hbonds.disp file) into the terminal?
I am operating on a Mac do I need a separate program to run these scripts?
I don’t have much knowledge on how to “run scripts” so I would really appreciate your guidance.

Simplest would be to use some third party software to analyse the hydrogen bonds.

Running the scripts you mentioned would analyse all models.

There are plenty of software for this. An old one that still does a good job is dimplot (part of ligplot).

Example of a recent tool is described in Arpeggio: A Web Server for Calculating and Visualising Interatomic Interactions in Protein Structures - PMC