Question about sampling to a selected region of the surface

Hi, everyone!
Now, I am trying to docking protein A to selected surface region of protein B in my . In the calculation, I set solvent accessible surface residues (surface area > 50 Ang.^2) of protein A to be active sites.

For protein B, I had tried two ways. One way is seting SASA on the selected region of protein B as the active site. And randomly AIRs definition is turned off (ranair=false in run.cns). But this jobs fails, beacuase of too many structures fail in it0 stage. (In another protein docking test with different molecules, this is ok.) Although I have searched the solution in forum in which suggestion is to change the random seed. I have try this, but it doesn’t work. The following is the error information for other to locate this issue:
HADDOCK cannot continue due to failed structures in it0
HADDOCK could not copy failed structures from previous iteration
The following structures could not be docked:

The another way is that leaving protein B’s active residue setting blank and set the nseg_B = 1, and B_start_seg_1=“1”; B_end_seg_1=“410”; ranairs=false. However, there are also many structures failed in it0 stage and I kill the job. By the way, after killing the job by job system command, the KeepAlive.py will keep generate the out file. So I have to the node to kill the KeepAlive.py job. Counld I have more convinent way to clean the jobs by haddock-clean.py script??

So, are so many failed structures caused by the incorrected setting of active residues? In semi-flexible segment setting, many residues in squence B_start_seg_1 to B_end_seg_1 are not surface residues. Does it cause problems? If it is, how to set the discreted surface residues ID in this part.
Thanks Very Much for your time and Response!!!

Hi! Errors in it0 generally indicate there is something wrong with the restraints.

Setting all solvent accessible surface residues as active is not ideal. In this manner the whole surface is expected to be in contact with the partner, which is not possible.

If you do not have any information you could try center of mass restraints/random defined AIRs.

Thanks for your suggestion!

Yes. At the first time, I also think it is not proper. But in my other tests, it works. The reason maybe haddock can randomly exclude part of the AIR restraints. But in the calculation I mentioned above, it doesn’t work.

How can I dock protein A to the selected surface region of protein B. I have set the surface in run.cns by nseg_B = 1, and B_start_seg_1=“1”; B_end_seg_1=“410”. Of course, many residues from residues 1 to residue 410 are not surface residues. But many calculations in stage it0 crashed. How can I solve this problem??

Thanks again!

The parameter nseg defines flexible regions, by default it set to automatic (-1). This means that during the semi-flexible stage, regions that are part of an interface will automatically be defined as flexible.

Maybe what you are looking for is how to define the restraints, you can find more information here: http://www.bonvinlab.org/software/haddock2.2/generate_air_help and here http://www.bonvinlab.org/education/HADDOCK/.

Are all the it0 docking failing? I.e. no single model generated?

Always good to check the out files and search for error messages.

Do I understand well that you have no info for proteinA and want to limit the sampling of the surface of proteinB ?
Not so simple to do that.

One way of doing that on the server would be to define as active residues the surface residues of proteinB (the limited surface region)
and as passive all solvent accessible residues of proteinA. Then increase the random removal of restraints to a large fraction, e.g. 90% (do the math to figure out what the number should be - for 90% random removal the number should be: 1.111111)

Actually it is also used with ranair to limit the sampling to specific regions.
See: http://www.bonvinlab.org/software/haddock2.2/generate_air_help/#ranair

If nseg is set to a negative number, in combination with ranair=true, the flexible residues will be automatically selected, but the random air sampling will be limited to to the segments defined in run.cns

One more question about the failed runs: Are those crashing because of errors? Or are they killed because they reach some queue limit?

I have checked the failed jobs’ out file. The error indicates “there is not enough memory available to the program.” %ALLHP error encountered: not enough memory available (CNS is in mode: SET ABORT=NORMal END). Some of them are crashed. Part of them succed.

I have read this page for ranair. However, I set nseg_B=1 (the number of segment specified in the following in run.cns), rather than negtive(-1). One question about the segment definition. For example, the limited surface region of protein B that I want to sample is from residue 1 to residue 410. So I set B_start_seg_1=“1”; B_end_seg_1=“410”. Is it OK? I am not sure about it because many of the residues are not surface residues.

Yes, This is what I want to do as you say "you have no info for proteinA and want to limit the sampling of the surface of proteinB ".

Actually, I have one question about the parameter ncvpart in run.cns. Does it affect the result very much, even I have a good sampling of the surface? From my some results, the interface of the resulting complexes can be classified into two opposited side. While the default setting of ncvpart is 2. One possible reason of the observation are the shape of the protein and the distribution of the potential active sites. But I am also warried about this maybe the effect of the setting of nvcvpart. More tests are needed to excluded my worry.

Sorry for my long response. I am very appreciated you for your time and helpful answer.

It looks like you are running in a memory problem on your compute node.

Part of the problem must come from defining the entire surface as active residues… Too many distances defined.

Move more into ab-initio docking and filter afterwards the solutions

For your scenarioI I would suggest to look into using LightDock instead of HADDOCK.

For the code see: https://github.com/brianjimenez/lightdock

And the publication describing the use of info to bias the sampling:

• J.L. Roel Touris, A.M.J.J. Bonvin and B. Jimenez-Garcia. 
      LightDock goes information-driven. Bioinformatics, Advanced Online Publication, btz642 (2019).
      https://doi.org/10.1093/bioinformatics/btz642