The DNA structure error during Protein-DNA docking

Dear HADDOCK developers,

I am following the Protein–DNA docking tutorial with HADDOCK3 (https://www.bonvinlab.org/education/HADDOCK3/HADDOCK3-protein-DNA-basic/) and obtained docking results consistent with the precomputed examples. However, I noticed that the DNA termini in my output structures are not correct.

Specifically:

  1. At the 5′ end, the terminal group is not properly patched. Depending on the intended chemistry:
    • If it should be 5′-OH (the common case), the structure instead has an extra PO₂ group and is missing one hydrogen.

    • If it should be 5′-phosphate, then the terminal phosphate is missing one hydroxyl group (appearing as PO₂ rather than PO₃H).

  2. At the 3′ end, the H3T atom is missing, leaving O3′ unprotonated.
    Together, these issues mean the two termini are effectively missing the equivalent of one H₂O molecule.

From what I understand, the root cause is that the DNA is being treated as a single segment. Because only one segment is allowed, HADDOCK3 does not apply the correct terminal patches (5TER and 3TER). This appears to be the underlying issue, and the structural errors at the ends are consequences of it.

Could you please confirm if this interpretation is correct, and advise on the recommended way to set up DNA input in HADDOCK3 so that proper terminal patches are applied?

Thank you very much for your guidance.

Best regards,

Takashi

Dear Takashi,

Thank you for the detailed description. You’re right, there is an issue with the NA’s terminal residues. This is partially addressed by the 5_phosphate parameter of topoaa, more details at the bottom of this page: Topology - HADDOCK3 User Manual. However, since DNA is indeed treated as a single segment, setting 5_phosphate = True will modify only 2 out of the 4 terminal residues. I’m afraid there isn’t an easy solution at the moment.

Some thoughts:
If we’d split a DNA into two chains, 5_phosphate = True would modify all 4 terminal residues, but then the docking problem would grow from 2-body into 3-body, and I don’t see an easy way to bring DNA into adequate shape while not impacting interaction between the DNA and its docking partner. In principle, it might be possible to use a lot of un-ambiguous restraints (3 per each base pair?) to try to bring DNA strands together, but I’m not sure it’s possible with default electrostatics. Reducing electrostatics might help but will also distort or at least weaken the main interactions. And it will be computationally expensive as well with that many restraints.

Possible solution would be to enable topoaa to use custom lists of NA residues to be modified - separate for 5’ and 3’. We’ll look into it with the team.

If you add a TER statement in the PBD file after the first strand it should help (if I am correct)

Dear Developers,

Thank you very much for your previous explanations—they were very helpful.

I now have a few additional questions that I hope you can kindly clarify.

When examining the precomputed/01_rigidbody/rigidbody_1.pdb.gz file, I noticed that the energy output of the rigidbody stage shows zero values for bonds, angles, dihedrals, and impropers:

REMARK            total,bonds,angles,improper,dihe,vdw,elec,air,cdih,coup,rdcs,vean,dani,xpcs,rg
REMARK energies: 857.922, 0, 0, 0, 0, 44.9973, -3.45878, 814.763, 0, 0, 0, 0, 0, 0, 0

However, the 00_topoaa/OR1_unbound_chain_B_haddock.psf file clearly contains information on bonds, angles, and impropers. I am confused as to why these terms appear as zero in the rigidbody stage, and also why there seems to be no dihedral information in the PSF file in precomputed/00_topoaa. Could you explain how the dihedral energy is computed in this context?

In addition, I am unsure about the distinction between total energy and internal energy of free molecules/complex, since in the rigidbody output I see:

REMARK Internal energy free molecules: 4735.86
REMARK Internal energy complex: 4735.86

In my attempt that trying to follow and reproduce the tutorial, I examine the rigidbody_1.pdb.gz which give following energy values:

REMARK            total,bonds,angles,improper,dihe,vdw,elec,air,cdih,coup,rdcs,vean,dani,xpcs,rg
REMARK energies: 1762.75, 395.903, 3516.97, 1442.83, 759.317, 34.3393, -3.14796, 1722.23, 0, 0, 0, 0, 0, 0, 0

which seems that those energy term is calculated but the total energy is not the add up of those energy. Again, without the dihedral information in the psf file, how could the dihedral energy to be calculated?

I just want to make sure I did not unintentionally make a mistake in my setup, so I would be grateful for your guidance.

Thank you very much for your time and support.

Takashi

When examining the precomputed/01_rigidbody/rigidbody_1.pdb.gz file, I noticed that the energy output of the rigidbody stage shows zero values for bonds, angles, dihedrals, and impropers:

REMARK            total,bonds,angles,improper,dihe,vdw,elec,air,cdih,coup,rdcs,vean,dani,xpcs,rg
REMARK energies: 857.922, 0, 0, 0, 0, 44.9973, -3.45878, 814.763, 0, 0, 0, 0, 0, 0, 0

The docking models in principle only contain the intermolecular energy components (but all terms are used in the computations). Note that we have modified the code to also report the intramolecular energy terms in the header of the PDB files. The precomputed examples must have been generated with an older version of HADDOCK3.

and also why there seems to be no dihedral information in the PSF file in precomputed/00_topoaa. Could you explain how the dihedral energy is computed in this context?

There are indeed no dihedral defined (there are mostly relevant for the phosphate backbone. Instead, for DNA/RNA, we are restraining the dihedral angles around their original values. This is controlled in the modules allowing for flexibility (flexref, emref, mdref, emscoring, mdscoring) by the dnarest_on parameter:


dnarest_on:
default: false
type: boolean
title: Restrain the DNA conformation
short: Automatically creates restraints to maintain the conformation of DNA/RNA
long: "This option allows to restraint the conformation of nucleic acids based on the values from the input structures.
The following restraints will be automatically defined:
- single base planarity
- sugar pucker
- phosphate backbone diherdral angle restraints
- Watson-Crick base pairing"
group: 'other restraints'
explevel: easy

In addition, I am unsure about the distinction between total energy and internal energy of free molecules/complex, since in the rigidbody output I see:

REMARK Internal energy free molecules: 4735.86
REMARK Internal energy complex: 4735.86

I would not use those energies for anything. There are there for historical reasons (and we should probably remove them). At the rigid body stage the two are identical.

Once flexibility is considered (e.g. flexref), you will notice difference. In the case to the “free molecules”, each is minimised alone and their sum is compared to the complex internal energy.

In my attempt that trying to follow and reproduce the tutorial, I examine the rigidbody_1.pdb.gz which give following energy values:

REMARK            total,bonds,angles,improper,dihe,vdw,elec,air,cdih,coup,rdcs,vean,dani,xpcs,rg
REMARK energies: 1762.75, 395.903, 3516.97, 1442.83, 759.317, 34.3393, -3.14796, 1722.23, 0, 0, 0, 0, 0, 0, 0

which seems that those energy term is calculated but the total energy is not the add up of those energy.

The total neglects the intramolecular terms.

Again, without the dihedral information in the psf file, how could the dihedral energy to be calculated?

There are dihedrals defined on the protein side.