BioStructures.jl icon indicating copy to clipboard operation
BioStructures.jl copied to clipboard

Error loading `*.pdb` files generated by OpenBabel or `AtomsBase.jl`-compatible packages

Open hellmrf opened this issue 8 months ago • 2 comments

I'm getting an error while reading basically any file generated by OpenBabel.

ERROR: Two copies of the same atom have the same alternative location ID.

The same error is also thrown for some *.pdb files generated by Chemfiles.jl, PDBTools.jl or AtomsIO.jl (AtomsBase.jl-compatible).

Open Babel

Let's use obabel to generate some files:

obabel -:"CC(=O)OC1=CC=CC=C1C(O)=O" -O aspirin.pdb --gen3d
obabel -:"CC(=O)OC1=CC=CC=C1C(O)=O" -O aspirin.mmcif --gen3d
obabel -:"CC(=O)NC1=CC=C(C=C1)O" -O paracetamol.pdb --gen3d
obabel *.pdb -O multi.pdb

And now we have aspirin.pdb, aspirin.mmcif, paracetamol.pdb, and multi.pdb (or download.zip).

Chemfiles.jl

For example, with Chemfiles.jl, we can load them:

julia> using Chemfiles

julia> trajectory = Trajectory("multi.pdb")
Trajectory(Chemfiles.CxxPointer{Chemfiles.lib.CHFL_TRAJECTORY}(Ptr{Chemfiles.lib.CHFL_TRAJECTORY} @0x0000000009294110, false), nothing)

julia> trajectory |> length |> Int
2

julia> frame = read(trajectory)
Frame(Chemfiles.CxxPointer{Chemfiles.lib.CHFL_FRAME}(Ptr{Chemfiles.lib.CHFL_FRAME} @0x0000000002417e60, false))

julia> positions(frame)
3×21 Chemfiles.ChemfilesArray:
 1.071  2.533  2.977   3.273   4.679  5.417  6.803   7.457   6.728   5.335   4.64    3.521   5.085  0.494   0.787  0.852  4.928  7.37    8.536   7.252   3.312
 0.277  0.556  1.556  -0.507  -0.5    0.658  0.652  -0.51   -1.679  -1.69   -2.983  -3.163  -3.843  1.136  -0.601  0.128  1.576  1.563  -0.507  -2.585  -2.419
 1.264  1.454  2.0     0.924   1.053  0.783  0.939   1.343   1.57    1.409   1.648   0.915   2.389  1.62    1.849  0.206  0.47   0.761   1.48    1.873   0.319

julia> f0 = read_step(trajectory, 0)
Frame(Chemfiles.CxxPointer{Chemfiles.lib.CHFL_FRAME}(Ptr{Chemfiles.lib.CHFL_FRAME} @0x0000000009cde290, false))

julia> f1 = read_step(trajectory, 1)
Frame(Chemfiles.CxxPointer{Chemfiles.lib.CHFL_FRAME}(Ptr{Chemfiles.lib.CHFL_FRAME} @0x0000000009c23c10, false))

julia> positions(f0)
3×21 Chemfiles.ChemfilesArray:
 1.071  2.533  2.977   3.273   4.679  5.417  6.803   7.457   6.728   5.335   4.64    3.521   5.085  0.494   0.787  0.852  4.928  7.37    8.536   7.252   3.312
 0.277  0.556  1.556  -0.507  -0.5    0.658  0.652  -0.51   -1.679  -1.69   -2.983  -3.163  -3.843  1.136  -0.601  0.128  1.576  1.563  -0.507  -2.585  -2.419
 1.264  1.454  2.0     0.924   1.053  0.783  0.939   1.343   1.57    1.409   1.648   0.915   2.389  1.62    1.849  0.206  0.47   0.761   1.48    1.873   0.319

julia> positions(f1)
3×20 Chemfiles.ChemfilesArray:
  1.078  2.546  2.933   3.328   4.731   5.51    6.902   7.515   6.757   5.364   8.87   0.545   0.684   0.914   2.864  5.061  7.48    7.246   4.79    9.26
 -0.099  0.135  1.177  -0.932  -1.036  -0.175  -0.304  -1.3    -2.184  -2.059  -1.44   0.855  -0.775  -0.523  -1.669  0.606  0.383  -2.962  -2.754  -0.721
  0.096  0.344  0.857  -0.067   0.04    0.816   0.848   0.099  -0.656  -0.678   0.077  0.152   0.859  -0.899  -0.582  1.422  1.457  -1.236  -1.285   0.601

julia> positions(frame) == positions(f0)
true

PDBTools.jl

With PDBTools.jl (which is AtomsBase.jl/AtomsIO.jl-compatible), everything runs smooth:

julia> using PDBTools

julia> asp = read_pdb("aspirin.pdb")
   Vector{Atom{Nothing}} with 21 atoms with fields:
   index name resname chain   resnum  residue        x        y        z occup  beta model segname index_pdb
       1    C     UNL     X        1        1    1.071    0.277    1.264  1.00  0.00     1       X         1
       2    C     UNL     X        1        1    2.533    0.556    1.454  1.00  0.00     1       X         2
       3    O     UNL     X        1        1    2.977    1.556    2.000  1.00  0.00     1       X         3
       4    O     UNL     X        1        1    3.273   -0.507    0.924  1.00  0.00     1       X         4
       5    C     UNL     X        1        1    4.679   -0.500    1.053  1.00  0.00     1       X         5
       6    C     UNL     X        1        1    5.417    0.658    0.783  1.00  0.00     1       X         6
       7    C     UNL     X        1        1    6.803    0.652    0.939  1.00  0.00     1       X         7
       8    C     UNL     X        1        1    7.457   -0.510    1.343  1.00  0.00     1       X         8
⋮
      14    H     UNL     X        1        1    0.494    1.136    1.620  1.00  0.00     1       X        14
      15    H     UNL     X        1        1    0.787   -0.601    1.849  1.00  0.00     1       X        15
      16    H     UNL     X        1        1    0.852    0.128    0.206  1.00  0.00     1       X        16
      17    H     UNL     X        1        1    4.928    1.576    0.470  1.00  0.00     1       X        17
      18    H     UNL     X        1        1    7.370    1.563    0.761  1.00  0.00     1       X        18
      19    H     UNL     X        1        1    8.536   -0.507    1.480  1.00  0.00     1       X        19
      20    H     UNL     X        1        1    7.252   -2.585    1.873  1.00  0.00     1       X        20
      21    H     UNL     X        1        1    3.312   -2.419    0.319  1.00  0.00     1       X        21

julia> traj = read_pdb("multi.pdb")
   Vector{Atom{Nothing}} with 41 atoms with fields:
   index name resname chain   resnum  residue        x        y        z occup  beta model segname index_pdb
       1    C     UNL     X        1        1    1.071    0.277    1.264  1.00  0.00     1       X         1
       2    C     UNL     X        1        1    2.533    0.556    1.454  1.00  0.00     1       X         2
       3    O     UNL     X        1        1    2.977    1.556    2.000  1.00  0.00     1       X         3
       4    O     UNL     X        1        1    3.273   -0.507    0.924  1.00  0.00     1       X         4
       5    C     UNL     X        1        1    4.679   -0.500    1.053  1.00  0.00     1       X         5
       6    C     UNL     X        1        1    5.417    0.658    0.783  1.00  0.00     1       X         6
       7    C     UNL     X        1        1    6.803    0.652    0.939  1.00  0.00     1       X         7
       8    C     UNL     X        1        1    7.457   -0.510    1.343  1.00  0.00     1       X         8
⋮
      35    H     UNL     X        1        2    0.914   -0.523   -0.899  1.00  0.00     2       X        14
      36    H     UNL     X        1        2    2.864   -1.669   -0.582  1.00  0.00     2       X        15
      37    H     UNL     X        1        2    5.061    0.606    1.422  1.00  0.00     2       X        16
      38    H     UNL     X        1        2    7.480    0.383    1.457  1.00  0.00     2       X        17
      39    H     UNL     X        1        2    7.246   -2.962   -1.236  1.00  0.00     2       X        18
      40    H     UNL     X        1        2    4.790   -2.754   -1.285  1.00  0.00     2       X        19
      41    H     UNL     X        1        2    9.260   -0.721    0.601  1.00  0.00     2       X        20

julia> read_pdb("multi.pdb", "residue 1")
   Vector{Atom{Nothing}} with 21 atoms with fields:
   index name resname chain   resnum  residue        x        y        z occup  beta model segname index_pdb
       1    C     UNL     X        1        1    1.071    0.277    1.264  1.00  0.00     1       X         1
       2    C     UNL     X        1        1    2.533    0.556    1.454  1.00  0.00     1       X         2
       3    O     UNL     X        1        1    2.977    1.556    2.000  1.00  0.00     1       X         3
       4    O     UNL     X        1        1    3.273   -0.507    0.924  1.00  0.00     1       X         4
       5    C     UNL     X        1        1    4.679   -0.500    1.053  1.00  0.00     1       X         5
       6    C     UNL     X        1        1    5.417    0.658    0.783  1.00  0.00     1       X         6
       7    C     UNL     X        1        1    6.803    0.652    0.939  1.00  0.00     1       X         7
       8    C     UNL     X        1        1    7.457   -0.510    1.343  1.00  0.00     1       X         8
       9    C     UNL     X        1        1    6.728   -1.679    1.570  1.00  0.00     1       X         9
      10    C     UNL     X        1        1    5.335   -1.690    1.409  1.00  0.00     1       X        10
      11    C     UNL     X        1        1    4.640   -2.983    1.648  1.00  0.00     1       X        11
      12    O     UNL     X        1        1    3.521   -3.163    0.915  1.00  0.00     1       X        12
      13    O     UNL     X        1        1    5.085   -3.843    2.389  1.00  0.00     1       X        13
      14    H     UNL     X        1        1    0.494    1.136    1.620  1.00  0.00     1       X        14
      15    H     UNL     X        1        1    0.787   -0.601    1.849  1.00  0.00     1       X        15
      16    H     UNL     X        1        1    0.852    0.128    0.206  1.00  0.00     1       X        16
      17    H     UNL     X        1        1    4.928    1.576    0.470  1.00  0.00     1       X        17
      18    H     UNL     X        1        1    7.370    1.563    0.761  1.00  0.00     1       X        18
      19    H     UNL     X        1        1    8.536   -0.507    1.480  1.00  0.00     1       X        19
      20    H     UNL     X        1        1    7.252   -2.585    1.873  1.00  0.00     1       X        20
      21    H     UNL     X        1        1    3.312   -2.419    0.319  1.00  0.00     1       X        21

julia> read_pdb("multi.pdb", "residue 2")
   Vector{Atom{Nothing}} with 20 atoms with fields:
   index name resname chain   resnum  residue        x        y        z occup  beta model segname index_pdb
      22    C     UNL     X        1        2    1.078   -0.099    0.096  1.00  0.00     2       X         1
      23    C     UNL     X        1        2    2.546    0.135    0.344  1.00  0.00     2       X         2
      24    O     UNL     X        1        2    2.933    1.177    0.857  1.00  0.00     2       X         3
      25    N     UNL     X        1        2    3.328   -0.932   -0.067  1.00  0.00     2       X         4
      26    C     UNL     X        1        2    4.731   -1.036    0.040  1.00  0.00     2       X         5
      27    C     UNL     X        1        2    5.510   -0.175    0.816  1.00  0.00     2       X         6
      28    C     UNL     X        1        2    6.902   -0.304    0.848  1.00  0.00     2       X         7
      29    C     UNL     X        1        2    7.515   -1.300    0.099  1.00  0.00     2       X         8
      30    C     UNL     X        1        2    6.757   -2.184   -0.656  1.00  0.00     2       X         9
      31    C     UNL     X        1        2    5.364   -2.059   -0.678  1.00  0.00     2       X        10
      32    O     UNL     X        1        2    8.870   -1.440    0.077  1.00  0.00     2       X        11
      33    H     UNL     X        1        2    0.545    0.855    0.152  1.00  0.00     2       X        12
      34    H     UNL     X        1        2    0.684   -0.775    0.859  1.00  0.00     2       X        13
      35    H     UNL     X        1        2    0.914   -0.523   -0.899  1.00  0.00     2       X        14
      36    H     UNL     X        1        2    2.864   -1.669   -0.582  1.00  0.00     2       X        15
      37    H     UNL     X        1        2    5.061    0.606    1.422  1.00  0.00     2       X        16
      38    H     UNL     X        1        2    7.480    0.383    1.457  1.00  0.00     2       X        17
      39    H     UNL     X        1        2    7.246   -2.962   -1.236  1.00  0.00     2       X        18
      40    H     UNL     X        1        2    4.790   -2.754   -1.285  1.00  0.00     2       X        19
      41    H     UNL     X        1        2    9.260   -0.721    0.601  1.00  0.00     2       X        20

BioStructures.jl

But with BioStructures.jl and PDBFormat, we get some errors:

julia> using BioStructures

julia> struc = read("aspirin.pdb", PDBFormat)
ERROR: Two copies of the same atom have the same alternative location ID. Existing atom:
Atom C with serial 1, coordinates [1.071, 0.277, 1.264]
New atom record to add:
AtomRecord C with serial 2, coordinates [2.533, 0.556, 1.454]
Stacktrace:
  [1] error(s::String)
    @ Base ./error.jl:35
  [2] unsafe_addatomtomodel!(mo::Model, atom_rec::AtomRecord; remove_disorder::Bool)
    @ BioStructures ~/.julia/packages/BioStructures/ZBun0/src/model.jl:1652
  [3] unsafe_addatomtomodel!
    @ ~/.julia/packages/BioStructures/ZBun0/src/model.jl:1559 [inlined]
  [4] read(input::IOStream, ::Type{…}; structure_name::String, remove_disorder::Bool, read_std_atoms::Bool, read_het_atoms::Bool, run_dssp::Bool, run_stride::Bool)
    @ BioStructures ~/.julia/packages/BioStructures/ZBun0/src/pdb.jl:66
  [5] read
    @ ~/.julia/packages/BioStructures/ZBun0/src/pdb.jl:47 [inlined]
  [6] (::BioStructures.var"#70#71"{String, @Kwargs{}, DataType})(input::IOStream)
    @ BioStructures ~/.julia/packages/BioStructures/ZBun0/src/pdb.jl:113
  [7] open(f::BioStructures.var"#70#71"{String, @Kwargs{}, DataType}, args::String; kwargs::@Kwargs{})
    @ Base ./io.jl:410
  [8] open
    @ ./io.jl:407 [inlined]
  [9] #read#69
    @ ~/.julia/packages/BioStructures/ZBun0/src/pdb.jl:112 [inlined]
 [10] read(filepath::String, t::Type{PDBFormat})
    @ BioStructures ~/.julia/packages/BioStructures/ZBun0/src/pdb.jl:108
 [11] top-level scope
    @ REPL[7]:1
Some type information was truncated. Use `show(err)` to see complete types.

(Same behavior for paracetamol.pdb or multi.pdb).

Same for MMCIFDict (now with aspirin.mmcif):

julia> struc = "aspirin.mmcif" |> MMCIFDict |> MolecularStructure
ERROR: KeyError: key "_atom_site.group_PDB" not found
Stacktrace:
 [1] getindex
   @ ./dict.jl:477 [inlined]
 [2] getindex
   @ ~/.julia/packages/BioStructures/ZBun0/src/mmcif.jl:84 [inlined]
 [3]
   @ BioStructures ~/.julia/packages/BioStructures/ZBun0/src/mmcif.jl:332
 [4] MolecularStructure
   @ ~/.julia/packages/BioStructures/ZBun0/src/mmcif.jl:321 [inlined]
 [5] |>(x::MMCIFDict, f::Type{MolecularStructure})
   @ Base ./operators.jl:926
 [6] top-level scope
   @ REPL[9]:1
Some type information was truncated. Use `show(err)` to see complete types.

Non-Open Babel

Maybe the problem is with Open Babel, right? So let's download Aspirin's SDF from PubChem and use Chemfiles.jl to convert it to PDB.

julia> using Downloads; Downloads.download("https://pubchem.ncbi.nlm.nih.gov/rest/pug/conformers/000008C400000001/SDF?response_type=save&response_basename=2244", "./2244.sdf")
"./2244.sdf"

julia> using Chemfiles

julia> traj = Trajectory("2244.sdf")
Trajectory(Chemfiles.CxxPointer{Chemfiles.lib.CHFL_TRAJECTORY}(Ptr{Chemfiles.lib.CHFL_TRAJECTORY} @0x0000000002315a30, false), nothing)

julia> frame = read(traj)
Frame(Chemfiles.CxxPointer{Chemfiles.lib.CHFL_FRAME}(Ptr{Chemfiles.lib.CHFL_FRAME} @0x00000000030c9fd0, false))

julia> Trajectory("chfl.pdb", 'w') do traj
           write(traj, frame)
       end

julia> using BioStructures

julia> struc = read("chfl.pdb", PDBFormat)
ERROR: PDBParseError: could not read model serial number at line 1 of file:
MODEL    1
Stacktrace:
 [1] read(input::IOStream, ::Type{…}; structure_name::String, remove_disorder::Bool, read_std_atoms::Bool, read_het_atoms::Bool, run_dssp::Bool, run_stride::Bool)
   @ BioStructures ~/.julia/packages/BioStructures/ZBun0/src/pdb.jl:74
 [2] read
   @ ~/.julia/packages/BioStructures/ZBun0/src/pdb.jl:47 [inlined]
 [3] (::BioStructures.var"#70#71"{String, @Kwargs{}, DataType})(input::IOStream)
   @ BioStructures ~/.julia/packages/BioStructures/ZBun0/src/pdb.jl:113
 [4] open(f::BioStructures.var"#70#71"{String, @Kwargs{}, DataType}, args::String; kwargs::@Kwargs{})
   @ Base ./io.jl:410
 [5] open
   @ ./io.jl:407 [inlined]
 [6] #read#69
   @ ~/.julia/packages/BioStructures/ZBun0/src/pdb.jl:112 [inlined]
 [7] read(filepath::String, t::Type{PDBFormat})
   @ BioStructures ~/.julia/packages/BioStructures/ZBun0/src/pdb.jl:108
 [8] top-level scope
   @ REPL[9]:1

caused by: ArgumentError: input string is empty or only contains whitespace
Stacktrace:
  [1] tryparse_internal(::Type{Int64}, s::String, startpos::Int64, endpos::Int64, base_::Int64, raise::Bool)
    @ Base ./parse.jl:115
  [2] #parse#619
    @ ./parse.jl:254 [inlined]
  [3] parse
    @ ./parse.jl:253 [inlined]
  [4] read(input::IOStream, ::Type{…}; structure_name::String, remove_disorder::Bool, read_std_atoms::Bool, read_het_atoms::Bool, run_dssp::Bool, run_stride::Bool)
    @ BioStructures ~/.julia/packages/BioStructures/ZBun0/src/pdb.jl:72
  [5] read
    @ ~/.julia/packages/BioStructures/ZBun0/src/pdb.jl:47 [inlined]
  [6] (::BioStructures.var"#70#71"{String, @Kwargs{}, DataType})(input::IOStream)
    @ BioStructures ~/.julia/packages/BioStructures/ZBun0/src/pdb.jl:113
  [7] open(f::BioStructures.var"#70#71"{String, @Kwargs{}, DataType}, args::String; kwargs::@Kwargs{})
    @ Base ./io.jl:410
  [8] open
    @ ./io.jl:407 [inlined]
  [9] #read#69
    @ ~/.julia/packages/BioStructures/ZBun0/src/pdb.jl:112 [inlined]
 [10] read(filepath::String, t::Type{PDBFormat})
    @ BioStructures ~/.julia/packages/BioStructures/ZBun0/src/pdb.jl:108
 [11] top-level scope
    @ REPL[9]:1
Some type information was truncated. Use `show(err)` to see complete types.

Finals

Also, this seem to be related to #48.

download.zip

hellmrf avatar May 29 '25 23:05 hellmrf