haddock3：How to not save the process file

silence · May 21, 2025, 3:28am

My.cfg configuration file is as follows:

run_dir = "run-test"
mode = "local"
ncores = 64

molecules = [
    "data/1.pdb",
    "data/2.pdb"
]

[topoaa]
autohis = false

[rigidbody] 
tolerance = 5
sampling = 10000
surfrest = true
cmrest = true

[seletop]
select = 400

[flexref]
tolerance = 5

[emref]
tolerance = 5

[clustfcc]
min_population = 4

[seletopclusts]
top_models = 10

[caprieval]
# reference_fname = "data/1dee.pdb"

[prodigyprotein]
chains =  ["A", "B"] 
to_pkd = true

Each step will generate related folders and files. I don’t want to save so many. Our goal is not to focus more on the connection scoring but rather to obtain the score directly without needing so many intermediate files. Or is it possible to use parameters in the.cfg configuration file to not save the files of the current step?

amjjbonvin · May 21, 2025, 6:27am

Do you want to only score an existing model of a complex? And not redock?

If you have want to get the haddock score, your can use haddock3-score with as argument the PBD file of your complex.

In the current workflow you are performing an ab-initio docking run with increased sampling.
This does indeed generate a lot of files, which you can delete afterwards. There is no way of keeping all those files in memory.

Note that you don’t need surfrest to be turned on when you use cmrest - it unnecessarily increases the computational time.

silence · May 21, 2025, 9:48am

I tried the method you provided for haddock3-score and discovered some problems.

There are three questions:

What is the scoring logic of haddock3-score? How does it select the chains? If there are multiple chains in a complex, and I only want to know the docking scores of two of them, what should I do?
For example, I downloaded the 7fbk.pdb complex from the PDB website. It contains four chains A, B, C, and D, along with some other molecules. Without any processing, I directly used haddock3-score data/7fbk.pdb, and the output was HADDOCK-score (emscoring) = -323.9506. Then I only kept the four protein chains and removed all other substances, and the score was -289.3158. Subsequently, I kept the two chains A and C, and the score became -127.0435.
What are the differences between the scores obtained directly by calculating and re-aligning using the haddock3-score method and the emscoring.tsv scores obtained through other methods?
For example, the first method: Downloaded the 7fbk.pdb complex from the PDB website, retaining the required A and C chains. Using haddock3-score data/7fbk.pdb, the output was HADDOCK-score (emscoring) = -127.0435.
The second method: I used the fasta sequences of the A and C chains in the 7fbk.pdb complex to re-generate a single pdb file: 7fbk_a.pdb and 7fbk_c.pdb. Running the alignment with the following.cfg configuration file,

run_dir = "run-test"
mode = "local"
ncores = 64

molecules = [
    "data/7fbk_a.pdb",
    "data/7fbk_c.pdb"
]

[topoaa]
autohis = false

[rigidbody]
tolerance = 5
sampling = 10000
cmrest = true

[seletop] 
select = 400

[flexref]
tolerance = 5

[emref] 
tolerance = 5

[clustfcc]
min_population = 4

[seletopclusts]
top_models = 10

[emscoring]

the highest score in the emscoring.tsv file obtained was

structure	original_name	md5	score
emscoring_1.pdb	cluster_1_model_1.pdb	None	-105.694

What is the docking scoring threshold used to distinguish whether two molecules are bound?

amjjbonvin · May 21, 2025, 11:41am

By default HADDOCK scores all chains.

If you use for example the emscoring module, you can turn on the option to score by interface.
The information will then be written to the header of the PDB files

per_interface_scoring = true

What are the differences between the scores obtained directly by calculating and re-aligning using the haddock3-score method and the emscoring.tsv scores obtained through other methods?
For example, the first method: Downloaded the 7fbk.pdb complex from the PDB website, retaining the required A and C chains. Using haddock3-score data/7fbk.pdb, the output was HADDOCK-score (emscoring) = -127.0435.
The second method: I used the fasta sequences of the A and C chains in the 7fbk.pdb complex to re-generate a single pdb file: 7fbk_a.pdb and 7fbk_c.pdb. Running the alignment with the following.cfg configuration file,

In your second way of doing it you are redocking the complex, which is a pure waste of time for what you are doing!
You could combined the two files into one and use the following workflow:

run_dir = "run-test"
mode = "local"
ncores = 64

molecules = [
    "data/7fbk_a+c.pdb"
]

[topoaa]

[emscoring]
per_interface_scoring = true

You can even generate one ensemble of models for scoring purposes. This ensemble file should use the MODEL/ENDMDL way of combining multiple models (can be generated from a set of single models using the pdb_mkensemble command

Topic		Replies	Views
Is it possible only to get Haddock scores from my docked complex without running docking? HADDOCK	4	226	February 12, 2024
Scoring only 1 structure without docking HADDOCK	2	41	June 24, 2024
Only scoring, not docking HADDOCK	4	1000	December 7, 2016
Question on scoring of docked protein-protein complex (omit docking process) HADDOCK	3	38	March 27, 2025
Hdock3 scoring for large-scale complex models HADDOCK haddock	1	29	April 11, 2025

haddock3：How to not save the process file

Related topics