Violations analysis

Dear all

I have performed two different
HADDOCK2.2 calculations, one discarding for each docking trial 50% of the
restraints at random as default, and a second calculation where I have
turn off this option. The results are completely different
highlighting that perhaps I have some errors in my input data. In
both calculations ”active”
residues of protein A were derived from NMR data while, in absence of the chemical
shift assignment for the NMR signals of protein B, I applied CPORT to predict
active residues of protein “B”.

When I exclude the 50% of the
restraints at random, I obtain that only 67% of the HADDOCK structures
are clustered in 29 clusters while when I turn off such option I obtain
that 97% of the structures are clustered in 4 clusters! but in the latter
case the HADDOCK score is much higher ( positive values) than in the first case
and the Restraints violation energies are four time those obtained in the case
when I have excluded the 50% of the restraints.

In the structures of the adducts obtained
without excluding any residues, the orientation of the protein B is similar in
the four clusters and the residues of B involved in the interaction are
almost the same (those derived from PINUP predictors), but there are some “active”
residues of protein B that are not involved in the interaction at all; they are
therefore not satisfied. Such residues derived all from the prediction of one
of the predictors of CPORT. I think that this is the reason why I have an
increase in the Restraint violations energies.

My question is, do you think that turn
off the option that excludes the restraints at random is correct? ,

Which is the output file that provides a list of the restraints that give violations? I was thinking it was “ana_noe_viol_all.lis” file but I do not have this file in my dyrectories. Do I need to generate it with the script .ana_noe_viol.csh ?

How can I improve my calculations? Do you think it is correct exclude some residues, for example those predicted from CPORT that are not satisfied in the calculations?

Thanks for your suggestions

Francesca Cantini

Hi Francesca

With bioinformatics predictions, in particular CPORT geared to rather overpredict than underpredict, random removal is indeed the way to go.

Removing violated restraints in your case might be dangerous - biasing the results toward one answer…

I would try the following recipe instead:

For molecule A: define as active your NMR-derived residues

For molecule B: define as passive (no active) all solvent accessible residues

You can then turn off random removal if you trust your NMR data.
But do increase the sampling to say 10000/400/400 models for the three stages of haddock, respectively.

Dear Alex

Thanks for your suggestion. I have done as you suggested and I have obtained 5 clusters the first one :Cluster 1 :

HADDOCK score -53.0 +/- 1.9
Cluster size 351
RMSD from the overall lowest-energy structure 15.8 +/- 0.1
Van der Waals energy -58.9 +/- 4.4
Electrostatic energy -438.7 +/- 24.1
Desolvation energy 17.7 +/- 4.0
Restraints violation energy 760.2 +/- 31.10
Buried Surface Area 1868.6 +/- 79.3
Z-Score -1.0

It seems reasonable , now also the Restraints violation energy is decreased.
It does not contain the overall lowest-energy structure but the cluster which contains it has a cluster size of 4 structures and an higher z-score :

HADDOCK score -43.3 +/- 14.9
Cluster size 4
RMSD from the overall lowest-energy structure 0.6 +/- 0.4
Van der Waals energy -48.6 +/- 2.6
Electrostatic energy -478.8 +/- 82.6
Desolvation energy 30.4 +/- 5.5
Restraints violation energy 706.5 +/- 16.92
Buried Surface Area 1905.9 +/- 63.8
Z-Score -0.6

I will compare the cluster 1 with the lowest -energy structure . And I will also take also into account the CPORT prediction.


Dear All

I would like to ask, how can I generate the file “ana_noe_viol_all.lis” ?
Can I generate it with the script .ana_noe_viol.csh . ?
I tried but I’m not able , perhaps I miss some steps.
I would like to have a list of violations for each structure or for the best 4 structures of each clusters.


You need to uncompress the out files.
And next to the ana_noe_viol.csh file you also need to copy in the analysis directory ana_noe_viol.awk and count_noe_viol.awk

But this will not give the violations per cluster but rather for all models in the analysis directory. To perform a per-cluster re-analysis refer to:

Thanks a lot

I guess that to perform the per-cluster analysis I need to run HADDOCK on local pcs ?


Yes indeed - local install required - and you will need to edit the paths in run.cns after downloading the full run archive from the server.