Correct Chain Seperation in PDB
When we write a PDB file with multiple frames it is not possible to read the frames correctly using the BioPython PDBParser. We use ENDMDL to finish a frame but I have more commonly seen that others use a TER to end a frame.
I haven't checked yet what gromacs does when I convert a trajectory to PDB or what the BioPython writer produces.
To see what is wrong
>>> pdb = mda.coordinates.PDB.PDBReader('test.pdb')
>>> pdb.n_atoms
25
The number of atoms should be 5.
test.pdb
HEADER
TITLE MDANALYSIS FRAMES FROM 0, SKIP 1: Created by PrimitivePDBWriter
CRYST1 80.000 80.000 80.000 60.00 60.00 90.00 P 1 1
MODEL 1
ATOM 1 CA MET 1 0.000 1.000 2.000 1.00 84.71 C
ATOM 2 CA ARG 2 3.000 4.000 5.000 1.00 68.59 C
ATOM 3 CA ILE 3 6.000 7.000 8.000 1.00 48.30 C
ATOM 4 CA LYS 4 9.000 10.000 11.000 1.00 49.38 C
ATOM 5 CA LEU 5 12.000 13.000 14.000 1.00 42.10 C
ENDMDL
HEADER
TITLE MDANALYSIS FRAMES FROM 0, SKIP 1: Created by PrimitivePDBWriter
CRYST1 80.000 80.000 80.000 60.00 60.00 90.00 P 1 1
MODEL 2
ATOM 1 CA MET 1 0.000 2.000 4.000 1.00 84.71 C
ATOM 2 CA ARG 2 6.000 8.000 10.000 1.00 68.59 C
ATOM 3 CA ILE 3 12.000 14.000 16.000 1.00 48.30 C
ATOM 4 CA LYS 4 18.000 20.000 22.000 1.00 49.38 C
ATOM 5 CA LEU 5 24.000 26.000 28.000 1.00 42.10 C
ENDMDL
HEADER
TITLE MDANALYSIS FRAMES FROM 0, SKIP 1: Created by PrimitivePDBWriter
CRYST1 80.000 80.000 80.000 60.00 60.00 90.00 P 1 1
MODEL 3
ATOM 1 CA MET 1 0.000 4.000 8.000 1.00 84.71 C
ATOM 2 CA ARG 2 12.000 16.000 20.000 1.00 68.59 C
ATOM 3 CA ILE 3 24.000 28.000 32.000 1.00 48.30 C
ATOM 4 CA LYS 4 36.000 40.000 44.000 1.00 49.38 C
ATOM 5 CA LEU 5 48.000 52.000 56.000 1.00 42.10 C
ENDMDL
HEADER
TITLE MDANALYSIS FRAMES FROM 0, SKIP 1: Created by PrimitivePDBWriter
CRYST1 80.000 80.000 80.000 60.00 60.00 90.00 P 1 1
MODEL 4
ATOM 1 CA MET 1 0.000 8.000 16.000 1.00 84.71 C
ATOM 2 CA ARG 2 24.000 32.000 40.000 1.00 68.59 C
ATOM 3 CA ILE 3 48.000 56.000 64.000 1.00 48.30 C
ATOM 4 CA LYS 4 72.000 80.000 88.000 1.00 49.38 C
ATOM 5 CA LEU 5 96.000 104.000 112.000 1.00 42.10 C
ENDMDL
HEADER
TITLE MDANALYSIS FRAMES FROM 0, SKIP 1: Created by PrimitivePDBWriter
CRYST1 80.000 80.000 80.000 60.00 60.00 90.00 P 1 1
MODEL 5
ATOM 1 CA MET 1 0.000 16.000 32.000 1.00 84.71 C
ATOM 2 CA ARG 2 48.000 64.000 80.000 1.00 68.59 C
ATOM 3 CA ILE 3 96.000 112.000 128.000 1.00 48.30 C
ATOM 4 CA LYS 4 144.000 160.000 176.000 1.00 49.38 C
ATOM 5 CA LEU 5 192.000 208.000 224.000 1.00 42.10 C
ENDMDL
END
From what I see (http://www.wwpdb.org/documentation/file-format-content/format33/sect9.html#ENDMDL, http://www.wwpdb.org/documentation/file-format-content/format33/sect9.html#TER), the best is probably to have both.
When we implemented it we went with what the PDB standard said for multi frame files (i.e. NMR), which means MODEL records..
By the way, I am pretty sure that your PDB example is illegal format because I read HEADER and TITLE that there can only be one of these records and not have TITLE and HEADER records in each model; this is said explicitly under record types where HEADER is listed as a record of which only a single line may occur.
... so if the PDB "trajectory" was produced with MDAnalysis then that's a bug.
You guessed right. This is a PDB written by our PrimitivePDBWriter. I'll check what exactly happens.
I think you can simply change the issue title and label it as defect.
Oliver Beckstein email: [email protected]
Am Feb 24, 2016 um 0:10 schrieb kain88-de [email protected]:
You guessed right. This is a PDB written by our PrimitivePDBWriter. I'll check what exactly happens.
— Reply to this email directly or view it on GitHub.
No please open another issue. The header thing has nothing to do with the frame separation.
This is a good bug if you want to get started contributing MDAnalysis. The Problem is that we only write a ENDMDL record to distinguish between frames while we should write a ENDMDL and TER record.
This can be fixed in MDAnalysis/coordinates/PDB.py:PrimitivePDBWriter
I am working on this bug. What I understand is that the TER record should be written (before or after every ENDML, doesn't matter, right?)
It does matter that the TER is before the ENDMDL. The TER closes a chain and is part of that chain. ENDMDL closes a model, that contains one or several chains, everything that is part of the model should be between the MODEL and the ENDMDL records.
Since the TER issue is now fixed, is the bug solved?
Renamed the bug because it is about the TER record to separate chains
Is this issue still relevant? I would like to try.
@dim-99 This issue probably has been solved. I tested the example https://github.com/MDAnalysis/mdanalysis/issues/734#issue-135678152 with MDAnalysis 2.9.0.
The results:
>>> pdb = mda.coordinates.PDB.PDBReader('test.pdb')
>>> pdb.n_atoms
5
The document also shows how it deals with multiple models: https://docs.mdanalysis.org/2.9.0/documentation_pages/coordinates/PDB.html#MDAnalysis.coordinates.PDB.PDBReader
If the pdb file contains multiple MODEL records then it is read as a trajectory where the MODEL numbers correspond to frame numbers.
I think this issue could be closed.
Thank you @yuyuan871111 . I confirm. This is solved and should have been closed a long time ago.