Ene-residue - takes long

kamil · November 15, 2021, 11:55am

Dear HADDOCK community,

I’m running HADDOCK2.4 locally, using qsub. The task is to dock two proteins: 800aa and 300aa to find some suggestions about their interaction. I use coarse-grained representation, 50000 structures in it0, 5000 in it1 and 5000 in it2.

The point computations became inefficient is (fragment of haddock output):

BEGIN: Mon Nov 15 08:36:18 2021
Read 5000x5000 distance matrix in 14 seconds
Writing 354 Clusters
Coverage 67.34% (3367/5000)
END: Mon Nov 15 08:36:33 2021 [14.86 seconds]
Clustering in /home/project/cg_prim_core_50k/run1/structures/it1/analysis DONE
Check file /home/project/cg_prim_core_50k/run1/structures/it1/analysis /cluster.out
waiting for the ene-residue file in it1/analysis…

It runs on a single CPU (but might be easily made parallel as it just writes energies between pairs of residues to a single file - the pipeline might merge several files afterwards) - does parallelization of this process makes sense?

Also, does analyzing 5000 structures in it1 makes sense? Or just should I make 100’000 in it0 and then analyze 1000 in it1? I really wanted to sample the space of possibilities as I have no clues about the interaction, and the result from webserver with a limited number of structures was inconclusive.

I’d be grateful for any suggestions.

amjjbonvin · November 15, 2021, 12:40pm

Our default sampling is 1000/200/200 or 10000/400/400 in ab-initio mode.
And from time to time may-be pushing to 50000 it0 / 1000 it1

Your sampling numbers are much higher…

This is of course coming at a price.

Especially if you have some info to drive the docking you don’t need such high sampling.

amjjbonvin · November 15, 2021, 12:42pm

PS: The numbers below are actually quite efficient! clustering of 5000 models in 15s…

It1 and water are the more costly parts.

So not sure what you qualify as inefficient…

kamil · November 15, 2021, 1:20pm

Inefficient in that ene-residue job, but not in terms of the development but a rather overall approach - HADDOCK runs impressively fast (I have one 128thread CPU).
Maybe I should decrease it1 to 1000 since it would have a considerable dataset generated in it0 to choose from anyway.

I skip water as I’m using coarse-grained
{===>} firstwater=“no”; ← when this option was on by default the procedure crashed in it1

{===>} waterdock=false; ← wasn’t sure if I want that so disabled also.

amjjbonvin · November 15, 2021, 3:27pm

Inefficient in that ene-residue job, but not in terms of the development but a rather overall approach - HADDOCK runs impressively fast (I have one 128thread CPU).

I would simply skip that part, unless you need to extract those metrics. For sure for ab-initio docking it is very inefficient as you are basically calculating it for the entire surface.

Setting the analysis to clustering only will be much more efficient.

I skip water as I’m using coarse-grained
{===>} firstwater=“no”; ← when this option was on by default the procedure crashed in it1

Would be strange that it crashes at it1 since it is only performed after it1…

{===>} waterdock=false; ← wasn’t sure if I want that so disabled also.

You don’t want it - trust the default settings… And not possible with CG anyway

kamil · November 15, 2021, 4:08pm

Ah, sorry, yes, it crashed for smaller test coarse grained run, after it1, yes.

Topic		Replies	Views
Issue with Prolonged Runtime in HADDOCK2.5 During NOE Analysis HADDOCK	3	30	July 5, 2024
Reducing computational time HADDOCK	1	92	April 26, 2024
Job is taking too much time HADDOCK	3	398	October 14, 2020
Results analysis HADDOCK	1	1508	June 14, 2018
Result is delaying HADDOCK	0	230	May 23, 2022

Ene-residue - takes long

Related topics