PDBs from RCSB failed to reconstruct
Hi,
I have been testing PDBFixer for some time and it works great in most cases. Unfortunately, I have also found that there are several PDBs for which the missing residues are not reconstructed (6RPA, 6RPB and 1AO7). PDBFixer just does not find the missing parts in the bakcbone and thus it does not reconstruct them. I have tried several approaches: first I checked if the SEQRES is well defined (even created own proper SEQRES), which was true. Then, I realized that the amino acids in these PDBs are not numbered in order for the continuous parts (not missing ones) - I also fixed that. Another fix was also to remove letters for the variant mutations done in the experiments (in comparison to reference sequence). Unfortunately, none of the above worked and at the moment there is effort to check through the code and find the reason. But maybe I could find help from you. All the suggestions and help will be very appreciated!
All the best, Sławek
I downloaded 6RPA, and it looks to me like there are problems in the residue numbering. Take a look at chain D. The SEQRES and ATOM records match through residue 29 (GLY). Then the residue number jumps to 36, indicating six missing residues. But they don't appear in the SEQRES records. It carries right on at residue 36 as if nothing were there.
As a result, PDBFixer can't figure out any way to match up the sequence of chain D to the SEQRES records. If it can't align them, then it can't identify what residues are missing.
Hi Peter, thank you for your answer. Yes, that is correct. But there are no missing residues in the 29-36 aa - it is just a matter of residue numbering according to IMGT standard. The SEQRES represents properly the structure. Nevertheless, in attempt to solve it, the residues were renumbered properly in the input PDB for PDBFixer, but the problem was not resolved by doing that. Another possible issue, was that at that part 29-36 there is residue numbering ending with "A" letter, which is for indication of construct's mutations. This was also removed in input PDB and yet the structure remains unreconstructed. So the issue is not resolved. Cheers, Sławek
śr., 31 mar 2021, 20:38 użytkownik Peter Eastman @.***> napisał:
I downloaded 6RPA, and it looks to me like there are problems in the residue numbering. Take a look at chain D. The SEQRES and ATOM records match through residue 29 (GLY). Then the residue number jumps to 36, indicating six missing residues. But they don't appear in the SEQRES records. It carries right on at residue 36 as if nothing were there.
As a result, PDBFixer can't figure out any way to match up the sequence of chain D to the SEQRES records. If it can't align them, then it can't identify what residues are missing.
— You are receiving this because you authored the thread. Reply to this email directly, view it on GitHub https://github.com/openmm/pdbfixer/issues/219#issuecomment-811327240, or unsubscribe https://github.com/notifications/unsubscribe-auth/ATOUZI7VRYRA2KIZ4SF534TTGNT25ANCNFSM42BMES3Q .
Can you post your modified PDB file where you've fixed those problems? I can investigate what else is happening.
Hi, sorry for late response, but I was going through Covid and I was unable to take up the issue. The 6RPA PDB with fixed numbering is in the attachment. 6RPA_numbers_fixed.zip
pdbfixer should make use of the _pdbx_poly_seq_scheme record in the mmCIF format rather than the SEQRES in the legacy pdb format.
loop_
_pdbx_poly_seq_scheme.asym_id
_pdbx_poly_seq_scheme.entity_id
_pdbx_poly_seq_scheme.seq_id
_pdbx_poly_seq_scheme.mon_id
_pdbx_poly_seq_scheme.ndb_seq_num
_pdbx_poly_seq_scheme.pdb_seq_num
_pdbx_poly_seq_scheme.auth_seq_num
_pdbx_poly_seq_scheme.pdb_mon_id
_pdbx_poly_seq_scheme.auth_mon_id
_pdbx_poly_seq_scheme.pdb_strand_id
_pdbx_poly_seq_scheme.pdb_ins_code
_pdbx_poly_seq_scheme.hetero
...
D 4 1 MET 1 0 ? ? ? D . n
D 4 2 ALA 2 1 1 ALA ALA D . n
D 4 3 GLN 3 2 2 GLN GLN D . n
D 4 4 SER 4 3 3 SER SER D . n
D 4 5 VAL 5 4 4 VAL VAL D . n
D 4 6 ALA 6 5 5 ALA ALA D . n
D 4 7 GLN 7 6 6 GLN GLN D . n
D 4 8 PRO 8 7 7 PRO PRO D . n
D 4 9 GLU 9 8 8 GLU GLU D . n
D 4 10 ASP 10 9 9 ASP ASP D . n
D 4 11 GLN 11 10 10 GLN GLN D . n
D 4 12 VAL 12 11 11 VAL VAL D . n
D 4 13 ASN 13 12 12 ASN ASN D . n
D 4 14 VAL 14 13 13 VAL VAL D . n
D 4 15 ALA 15 14 14 ALA ALA D . n
D 4 16 GLU 16 15 15 GLU GLU D . n
D 4 17 GLY 17 16 16 GLY GLY D . n
D 4 18 ASN 18 17 17 ASN ASN D . n
D 4 19 PRO 19 18 18 PRO PRO D . n
D 4 20 LEU 20 19 19 LEU LEU D . n
D 4 21 THR 21 20 20 THR THR D . n
D 4 22 VAL 22 21 21 VAL VAL D . n
D 4 23 LYS 23 22 22 LYS LYS D . n
D 4 24 CYS 24 23 23 CYS CYS D . n
D 4 25 THR 25 24 24 THR THR D . n
D 4 26 TYR 26 25 25 TYR TYR D . n
D 4 27 SER 27 26 26 SER SER D . n
D 4 28 VAL 28 27 27 VAL VAL D . n
D 4 29 SER 29 28 28 SER SER D . n
D 4 30 GLY 30 29 29 GLY GLY D . n
D 4 31 ASN 31 36 36 ASN ASN D . n <---------------------------
D 4 32 PRO 32 37 37 PRO PRO D . n
D 4 33 TYR 33 38 38 TYR TYR D . n
D 4 34 LEU 34 39 39 LEU LEU D . n
D 4 35 PHE 35 40 40 PHE PHE D . n
D 4 36 TRP 36 41 41 TRP TRP D . n
D 4 37 TYR 37 42 42 TYR TYR D . n
D 4 38 VAL 38 43 43 VAL VAL D . n
D 4 39 GLN 39 44 44 GLN GLN D . n
D 4 40 TYR 40 45 45 TYR TYR D . n
D 4 41 PRO 41 46 46 PRO PRO D . n
D 4 42 ASN 42 47 47 ASN ASN D . n
...
where ? in _pdbx_poly_seq_scheme.auth_seq_num column indicates a missing/unmodeled residue.
Hoping everyone is alright.
I'm not sure what you mean. His input file is a PDB, not a PDBx/mmCIF.
Has anyone figured out how to solve the problem? I think 5J7S is problematic too.