Docking of 2000 protein-antibody pairs using HADDOCK

I was using local version of haddock for performing protein-antibody docking. I had three datasets comprising of positive(71 pairs), negative( 288 pairs) and complete dataset(2000 pairs). The haddock process was automated using a python code and it ran successfully for positive and negative dataset. But, while running for 2000 antigen-antibody pairs, it crashed multiple times. For several cases it went like:

Starting HADDOCK on: 2021-02-01 18:56:35

HADDOCK version: 2.4 - September 2020 release
Python version: 2.7.17 (default, Sep 30 2020, 13:38:04)
[GCC 7.5.0]
parsing run.cns file
parsing run.param in /home/iiitd/gayatri/antibodies_pdbs/120-updated-variants/2b04_ab/run2b04_ab_min_3007k8x_V445A/data
reading parameters from the file /home/iiitd/gayatri/antibodies_pdbs/120-updated-variants/2b04_ab/run2b04_ab_min_3007k8x_V445A/data/run.param
setting some variables:
N_COMP set to: 2
RUN_NUMBER set to: 2b04_ab_min_3007k8x_V445A
HADDOCK_DIR set to: /home/iiitd/gayatri/haddock2.4-2020-09
UNAMBIG_TBL set to: 2b04_ab_renum-unambig.tbl
PROT_SEGID_2 set to: B
PROT_SEGID_1 set to: A
PDB_FILE1 set to: 2b04_ab_renum.pdb
AMBIG_TBL set to: 2b04-7k8x-ambig.tbl
PROJECT_DIR set to: .
PDB_FILE2 set to: …/min_3007k8x_V445A.pdb
N_COMP 2
RUN_NUMBER 2b04_ab_min_3007k8x_V445A
HADDOCK_DIR /home/iiitd/gayatri/haddock2.4-2020-09
UNAMBIG_TBL 2b04_ab_renum-unambig.tbl
PROT_SEGID_2 B
PROT_SEGID_1 A
PDB_FILE1 2b04_ab_renum.pdb
AMBIG_TBL 2b04-7k8x-ambig.tbl
PROJECT_DIR .
PDB_FILE2 …/min_3007k8x_V445A.pdb
looking for existing files
waterdock false
waiting for the psf files…
waiting for the psf files…
waiting for the psf files…
waiting for the psf files…
waiting for the psf files…
waiting for the psf files…
waiting for the psf files…
waiting for the psf files…
waiting for the psf files…
waiting for the psf files…
waiting for the psf files…
waiting for the psf files…
waiting for the psf files…
waiting for the psf files…
waiting for the psf files…
waiting for the psf files…
waiting for the psf files…
waiting for the psf files…
waiting for the psf files…
waiting for the psf files…
waiting for the psf files…
waiting for the psf files…
waiting for the psf files…
waiting for the psf files…
waiting for the psf files…
waiting for the psf files…
waiting for the psf files…
waiting for the psf files…
waiting for the psf files…
waiting for the psf files…
waiting for the psf files…
waiting for the psf files…
waiting for the psf files…
waiting for the psf files…
waiting for the psf files…
waiting for the psf files…
waiting for the psf files…
waiting for the psf files…
waiting for the psf files…
waiting for the psf files…
waiting for the psf files…
waiting for the psf files…
waiting for the psf files…
waiting for the psf files…
waiting for the psf files…
waiting for the psf files…
waiting for the psf files…
and no docking was performed. The generate_2.out in begin directory gave this.

( SER 803 N ) 6644.0
EVALUATE: symbol $ID2 set to 6644.00 (real)
SELRPN: 1 atoms have been selected out of 9649
( SER 803 N )
EVALUATE: symbol $SEGID2 set to “” (string)
SELRPN: 1 atoms have been selected out of 9649
( SER 803 N ) 803
EVALUATE: symbol $RESID2 set to “803” (string)
SELRPN: 1 atoms have been selected out of 9649
( SER 803 N ) SER
EVALUATE: symbol $RESN2 set to “SER” (string)
SELRPN: 1 atoms have been selected out of 9649
SHOW: sum over selected elements = 1.000000
NEXTCD: condition evaluated as false
SELRPN: 1 atoms have been selected out of 9649
SELRPN: 1 atoms have been selected out of 9649
GEOM= 1.335219
NEXTCD: condition evaluated as false
FOR ID LOOP: symbol ID1 set to 6650.00 (real)
SELRPN: 1 atoms have been selected out of 9649
( SER 803 C )

It kept all the cores of system 100% occupied giving no results at the end. Also, I observed that for a single haddock run it runs around 680 tasks in background keeping all cores busy. Is it something related to 100% CPU utilisation ?

For few other cases, it seemed as if it worked but gave no complexes with generete_2.out showing this

--------------- cycle= 100 ------ stepsize= 0.0003 -----------------------
| Etotal =154.401 grad(E)=3.037 E(BOND)=0.166 E(ANGL)=126.590 |

E(DIHE)=17.506 E(IMPR)=0.254 E(VDW )=9.884

NBONDS: found 154320 intra-atom interactions
--------------- cycle= 150 ------ stepsize= 0.0002 -----------------------
| Etotal =143.248 grad(E)=3.036 E(BOND)=0.172 E(ANGL)=126.550 |

E(DIHE)=6.030 E(IMPR)=0.285 E(VDW )=10.212

--------------- cycle= 200 ------ stepsize= 0.0000 -----------------------
| Etotal =139.753 grad(E)=3.033 E(BOND)=0.175 E(ANGL)=126.620 |

E(DIHE)=2.488 E(IMPR)=0.189 E(VDW )=10.282

--------------- cycle= 250 ------ stepsize= -0.0002 -----------------------
| Etotal =138.643 grad(E)=3.029 E(BOND)=0.171 E(ANGL)=126.444 |

E(DIHE)=1.733 E(IMPR)=0.167 E(VDW )=10.127

--------------- cycle= 300 ------ stepsize= -0.0002 -----------------------
| Etotal =138.396 grad(E)=3.028 E(BOND)=0.169 E(ANGL)=126.400 |

E(DIHE)=1.488 E(IMPR)=0.161 E(VDW )=10.178

--------------- cycle= 350 ------ stepsize= -0.0002 -----------------------
| Etotal =138.392 grad(E)=3.028 E(BOND)=0.169 E(ANGL)=126.399 |

E(DIHE)=1.488 E(IMPR)=0.161 E(VDW )=10.176

--------------- cycle= 400 ------ stepsize= -0.0005 -----------------------

Could you please guide how can I run docking for such a large dataset or where am I going wrong here?

This looks like a problem with your input data, hence no topology. Check if there is something wrong with these 2000 models, either in numbering, chain, molecule composition, heteroatoms and etc.

The logs you posted here do not have any relevant information about the errors, please check the last lines of begin/generate_x.out to see why the topology could not be generated for each particular case.

Good luck!

Thanks for this quick response. So I have 18 antibodies that were docked with 120 antigen protein structures. There were in total ~2000 pairs for docking. One case in which the docking terminated had this in begin/generate_x.out file.

HEAP: maximum use = 20827112 current use = 96000 bytes
HEAP: maximum overhead = 3008 current overhead = 752 bytes
VCLOSE: Display file reset to OUTPUT.
============================================================
Maximum dynamic memory allocation: 20827112 bytes
Maximum dynamic memory overhead: 3008 bytes
Program started at: 14:40:33 on 31-Jan-2021
Program stopped at: 15:17:03 on 31-Jan-2021
CPU time used: 24.0398 seconds
============================================================
Regarding something wrong with the structures, I want to add that there were few cases for which same antigen model properly docked with an antibody while failed with another antibody. Also, the antibody structures were properly modelled since all the 18 antibodies worked well with negative and positive datasets. An example of same antigen with different antibody giving successful results for one case and failed for other case.

generate_2.out for successful run case.

EVALUATE: symbol $OUTSTRING set to ““BEGIN:min_3007k8x_N440Y_1.pdb”” (string)
EVALUATE: symbol $NUM set to 2.00000 (real)
NEXTCD: condition evaluated as false
SELRPN: 9538 atoms have been selected out of 9538
ASSFIL: file min_3007k8x_N440Y.psf opened.
ASSFIL: file min_3007k8x_N440Y.psf opened.
HEAP: maximum use = 20768648 current use = 96000 bytes
HEAP: maximum overhead = 3008 current overhead = 752 bytes
VCLOSE: Display file reset to OUTPUT.
============================================================
Maximum dynamic memory allocation: 20768648 bytes
Maximum dynamic memory overhead: 3008 bytes
Program started at: 13:31:36 on 31-Jan-2021
Program stopped at: 13:43:13 on 31-Jan-2021
CPU time used: 23.6038 seconds

generate_2.out for a failed run case.

EVALUATE: symbol $OUTSTRING set to ““BEGIN:min_3007k8x_N440Y_1.pdb”” (string)
EVALUATE: symbol $NUM set to 2.00000 (real)
NEXTCD: condition evaluated as false
SELRPN: 9538 atoms have been selected out of 9538
ASSFIL: file min_3007k8x_N440Y.psf opened.
ASSFIL: file min_3007k8x_N440Y.psf opened.
HEAP: maximum use = 20768648 current use = 96000 bytes
HEAP: maximum overhead = 3008 current overhead = 752 bytes
VCLOSE: Display file reset to OUTPUT.
============================================================
Maximum dynamic memory allocation: 20768648 bytes
Maximum dynamic memory overhead: 3008 bytes
Program started at: 13:58:57 on 31-Jan-2021
Program stopped at: 14:25:59 on 31-Jan-2021
CPU time used: 23.5330 seconds
============================================================

Look for error messages in the generate_X.out file in case of failures - start from the bottom of the file.

If the psf files are not generated the docking can not proceed