mdanalysis icon indicating copy to clipboard operation
mdanalysis copied to clipboard

Correct Chain Seperation in PDB

Open kain88-de opened this issue 9 years ago • 12 comments

When we write a PDB file with multiple frames it is not possible to read the frames correctly using the BioPython PDBParser. We use ENDMDL to finish a frame but I have more commonly seen that others use a TER to end a frame.

I haven't checked yet what gromacs does when I convert a trajectory to PDB or what the BioPython writer produces.

To see what is wrong

>>> pdb = mda.coordinates.PDB.PDBReader('test.pdb')
>>> pdb.n_atoms
25

The number of atoms should be 5.

test.pdb

HEADER    
TITLE     MDANALYSIS FRAMES FROM 0, SKIP 1: Created by PrimitivePDBWriter
CRYST1   80.000   80.000   80.000  60.00  60.00  90.00 P 1           1
MODEL         1
ATOM      1  CA  MET     1       0.000   1.000   2.000  1.00 84.71           C
ATOM      2  CA  ARG     2       3.000   4.000   5.000  1.00 68.59           C
ATOM      3  CA  ILE     3       6.000   7.000   8.000  1.00 48.30           C
ATOM      4  CA  LYS     4       9.000  10.000  11.000  1.00 49.38           C
ATOM      5  CA  LEU     5      12.000  13.000  14.000  1.00 42.10           C
ENDMDL
HEADER    
TITLE     MDANALYSIS FRAMES FROM 0, SKIP 1: Created by PrimitivePDBWriter
CRYST1   80.000   80.000   80.000  60.00  60.00  90.00 P 1           1
MODEL         2
ATOM      1  CA  MET     1       0.000   2.000   4.000  1.00 84.71           C
ATOM      2  CA  ARG     2       6.000   8.000  10.000  1.00 68.59           C
ATOM      3  CA  ILE     3      12.000  14.000  16.000  1.00 48.30           C
ATOM      4  CA  LYS     4      18.000  20.000  22.000  1.00 49.38           C
ATOM      5  CA  LEU     5      24.000  26.000  28.000  1.00 42.10           C
ENDMDL
HEADER    
TITLE     MDANALYSIS FRAMES FROM 0, SKIP 1: Created by PrimitivePDBWriter
CRYST1   80.000   80.000   80.000  60.00  60.00  90.00 P 1           1
MODEL         3
ATOM      1  CA  MET     1       0.000   4.000   8.000  1.00 84.71           C
ATOM      2  CA  ARG     2      12.000  16.000  20.000  1.00 68.59           C
ATOM      3  CA  ILE     3      24.000  28.000  32.000  1.00 48.30           C
ATOM      4  CA  LYS     4      36.000  40.000  44.000  1.00 49.38           C
ATOM      5  CA  LEU     5      48.000  52.000  56.000  1.00 42.10           C
ENDMDL
HEADER    
TITLE     MDANALYSIS FRAMES FROM 0, SKIP 1: Created by PrimitivePDBWriter
CRYST1   80.000   80.000   80.000  60.00  60.00  90.00 P 1           1
MODEL         4
ATOM      1  CA  MET     1       0.000   8.000  16.000  1.00 84.71           C
ATOM      2  CA  ARG     2      24.000  32.000  40.000  1.00 68.59           C
ATOM      3  CA  ILE     3      48.000  56.000  64.000  1.00 48.30           C
ATOM      4  CA  LYS     4      72.000  80.000  88.000  1.00 49.38           C
ATOM      5  CA  LEU     5      96.000 104.000 112.000  1.00 42.10           C
ENDMDL
HEADER    
TITLE     MDANALYSIS FRAMES FROM 0, SKIP 1: Created by PrimitivePDBWriter
CRYST1   80.000   80.000   80.000  60.00  60.00  90.00 P 1           1
MODEL         5
ATOM      1  CA  MET     1       0.000  16.000  32.000  1.00 84.71           C
ATOM      2  CA  ARG     2      48.000  64.000  80.000  1.00 68.59           C
ATOM      3  CA  ILE     3      96.000 112.000 128.000  1.00 48.30           C
ATOM      4  CA  LYS     4     144.000 160.000 176.000  1.00 49.38           C
ATOM      5  CA  LEU     5     192.000 208.000 224.000  1.00 42.10           C
ENDMDL
END

kain88-de avatar Feb 23 '16 09:02 kain88-de

From what I see (http://www.wwpdb.org/documentation/file-format-content/format33/sect9.html#ENDMDL, http://www.wwpdb.org/documentation/file-format-content/format33/sect9.html#TER), the best is probably to have both.

jbarnoud avatar Feb 23 '16 12:02 jbarnoud

When we implemented it we went with what the PDB standard said for multi frame files (i.e. NMR), which means MODEL records..

By the way, I am pretty sure that your PDB example is illegal format because I read HEADER and TITLE that there can only be one of these records and not have TITLE and HEADER records in each model; this is said explicitly under record types where HEADER is listed as a record of which only a single line may occur.

orbeckst avatar Feb 24 '16 00:02 orbeckst

... so if the PDB "trajectory" was produced with MDAnalysis then that's a bug.

orbeckst avatar Feb 24 '16 00:02 orbeckst

You guessed right. This is a PDB written by our PrimitivePDBWriter. I'll check what exactly happens.

kain88-de avatar Feb 24 '16 07:02 kain88-de

I think you can simply change the issue title and label it as defect.

Oliver Beckstein email: [email protected]

Am Feb 24, 2016 um 0:10 schrieb kain88-de [email protected]:

You guessed right. This is a PDB written by our PrimitivePDBWriter. I'll check what exactly happens.

— Reply to this email directly or view it on GitHub.

orbeckst avatar Feb 24 '16 16:02 orbeckst

No please open another issue. The header thing has nothing to do with the frame separation.

kain88-de avatar Feb 24 '16 17:02 kain88-de

This is a good bug if you want to get started contributing MDAnalysis. The Problem is that we only write a ENDMDL record to distinguish between frames while we should write a ENDMDL and TER record.

This can be fixed in MDAnalysis/coordinates/PDB.py:PrimitivePDBWriter

kain88-de avatar Feb 28 '16 21:02 kain88-de

I am working on this bug. What I understand is that the TER record should be written (before or after every ENDML, doesn't matter, right?)

backpropper avatar Mar 11 '16 05:03 backpropper

It does matter that the TER is before the ENDMDL. The TER closes a chain and is part of that chain. ENDMDL closes a model, that contains one or several chains, everything that is part of the model should be between the MODEL and the ENDMDL records.

jbarnoud avatar Mar 11 '16 07:03 jbarnoud

Since the TER issue is now fixed, is the bug solved?

backpropper avatar Mar 13 '16 08:03 backpropper

Renamed the bug because it is about the TER record to separate chains

kain88-de avatar Mar 17 '16 21:03 kain88-de

Is this issue still relevant? I would like to try.

dim-99 avatar Jul 23 '24 16:07 dim-99

@dim-99 This issue probably has been solved. I tested the example https://github.com/MDAnalysis/mdanalysis/issues/734#issue-135678152 with MDAnalysis 2.9.0.

The results:

>>> pdb = mda.coordinates.PDB.PDBReader('test.pdb')
>>> pdb.n_atoms
5

The document also shows how it deals with multiple models: https://docs.mdanalysis.org/2.9.0/documentation_pages/coordinates/PDB.html#MDAnalysis.coordinates.PDB.PDBReader

If the pdb file contains multiple MODEL records then it is read as a trajectory where the MODEL numbers correspond to frame numbers.

I think this issue could be closed.

yuyuan871111 avatar Mar 20 '25 12:03 yuyuan871111

Thank you @yuyuan871111 . I confirm. This is solved and should have been closed a long time ago.

orbeckst avatar Mar 20 '25 18:03 orbeckst