Glycoprotein numbering

I’m trying to dock a protein to a glycoprotein. As part of this in my glycoprotein, I have chain A with atoms numbered 1-9192. After that for the relevant glycan residues, they’re part of HETATM labelled with chain G. For example:

HETATM 9193 C1 NAG G2000 115.116 143.737 183.737 0.00 0.00 C
HETATM 9194 C2 NAG G2000 114.443 144.217 185.063 0.00 0.00 C
HETATM 9195 C3 NAG G2000 113.283 145.209 184.736 0.00 0.00 C
HETATM 9196 C4 NAG G2000 113.757 146.357 183.823 0.00 0.00 C
HETATM 9197 C5 NAG G2000 114.434 145.798 182.541 0.00 0.00 C

I then tried to use the webserver to dock the glycoprotein and my other protein. However, I’m getting the error:

Error: Error in PDB file. Your PDB contains multiple residues with number 2000 in chain G or duplicated atom names.

I don’t think it’s recognizing that I have different atoms as part of the same residue for instance 2000 in this case. I checked a bunch of online glycoproteins from the PDB site and the sugar tree atoms should be in the same format. Am I missing something or is there another workaround to this error? Do I just need to label every line with a different residue number? Thanks in advance!

First of all if your glycan is attached to chain A is should also have chain A as ID and not chain G and its residue numbering should not overlap with the protein.

Further check that there is not overlap in atom naming within the same residue number

Dear Prof Bovin,

Yes I did realise there was an error with my atom naming. Also thanks for pointing out the chain name as well

I’ve now tried to run the docking but I’m having issues with the actual sugar tree itself. The first NAG - Asn bond seems to be in place but after that the other residues in the sugar tree just seem to completely be blown out from their original position despite the fact that they were covalently bonded. The exact sequence of my sugar tree is NAG-NAG-BMA-MAN-NAG-MAN so I know the residues should be supported. I think this issue was addressed before:

however, in the thread it was mentioned that HADDOCK now supports glycosylated proteins. My assumption then was that I didn’t need to do anything since HADDOCK would automatically recognize the sugar tree. So I’m wondering if I need to now restrain the sugar tree as a ligand and whether or not I need to name all the residues as something common i.e. not have differently named residues like NAG/BMA etc. Then I use ligand restraints to maintain the sugar tree.

I should lastly also add that I added the sugar tree in silico using glyprot since the actual structure hasn’t been solved before for this case. I don’t think that should make a difference but I thought it was worth mentioning in case I’m doing something else.

The list of supported glycans can be found in the following link:

https://wenmr.science.uu.nl/haddock2.4/library

The server does support glycans and has successfully done so in many cases.

But many things can go wrong, e.g. the atom naming is different… Unsupported linkage types…
Difficulty to tell what is the problem with your PDB file without looking at it.
So unless you share (can via direct email) your PDB with us, we can’t really help I am afraid.

Another possible server to build glycans is glycam (http://glycam.org)

OK - found the problem (I think) - you atom naming is non standard. The oxygens are not recognised and this prevent the linkages to be made.

Try the glycam server to build your glycan.

Here is for example a building block for NAG:

RESIdue NAG  !2-N-Acetyl-beta-D-glucopyranose
 GROUP
  ATOM C1   TYPE=CCE  CHARge= 0.350   END
  ATOM H1   TYPE=HAS  CHARge= 0.100   END
  ATOM O1   TYPE=OH1  CHARge=-0.650   END
  ATOM HO1  TYPE=H    CHARge= 0.400   END
  ATOM C2   TYPE=CCS  CHARge= 0.000   END
  ATOM H2   TYPE=HAS  CHARge= 0.100   END
  ATOM N2   TYPE=NH1  CHARge=-0.350   END
  ATOM HN2  TYPE=H    CHARge= 0.250   END
  ATOM C7   TYPE=C    CHARge= 0.550   END
  ATOM O7   TYPE=O    CHARge=-0.550   END
  ATOM C8   TYPE=CCS  CHARge=-0.300   END
  ATOM H81  TYPE=HAS  CHARge= 0.100   END
  ATOM H82  TYPE=HAS  CHARge= 0.100   END
  ATOM H83  TYPE=HAS  CHARge= 0.100   END
  ATOM C3   TYPE=CCS  CHARge= 0.150   END
  ATOM H3   TYPE=HAS  CHARge= 0.100   END
  ATOM O3   TYPE=OH1  CHARge=-0.650   END
  ATOM HO3  TYPE=H    CHARge= 0.400   END
  ATOM C4   TYPE=CCS  CHARge= 0.150   END
  ATOM H4   TYPE=HAS  CHARge= 0.100   END
  ATOM O4   TYPE=OH1  CHARge=-0.650   END
  ATOM HO4  TYPE=H    CHARge= 0.400   END
  ATOM C5   TYPE=CCS  CHARge= 0.100   END
  ATOM H5   TYPE=HAS  CHARge= 0.100   END
  ATOM O5   TYPE=OES  CHARge=-0.400   END
  ATOM C6   TYPE=CC6  CHARge= 0.050   END
  ATOM H61  TYPE=HAS  CHARge= 0.100   END
  ATOM H62  TYPE=HAS  CHARge= 0.100   END
  ATOM O6   TYPE=OH1  CHARge=-0.650   END
  ATOM HO6  TYPE=H    CHARge= 0.400   END

No need to worry about hydrogens, but your heavy atoms and especially the oxygens should be properly named.

Dear Prof Bonvin,

Thank you for your reply, I tried to use the webserver but I think they have their own glycam convention for naming both residues and atoms in the sugar residues. For instance NAG is named as 4YB; there is a C8 atom as part of NAG but not in 4YB which has something like C2N. I don’t think the GLYCAM format is as such recognized by HADDOCK. If I rename the residues to PDB standard ones e.g. NAG, MAN, HADDOCK seems to recognize these and then the sugar tree is maintained in the docking run. However, HADDOCK also then appears to rename the atoms to the convention as you have written above. I’m trying to run a simulation on the docked structure I have and so I need to build appropriate topology files for the glycan residues. The only workaround I’ve seen seems to be to use a combination of a glycam forcefield adapted to use with AMBER. My main question is if there’s a way to easily convert between the PDB to GLYCAM format or if haddock can directly recognize the GLYCAM format for docking. I’d really appreciate any help with this.

Well - we follow the official PDB naming. I guess you will have to rename the atoms.

It is complex enough already to have the proper topologies and parameters, we can’t start supporting all kinds of nomenclatures.