biopython icon indicating copy to clipboard operation
biopython copied to clipboard

PDBIOException re-raised ValueError with obfuscating message

Open CatChenal opened this issue 1 year ago • 3 comments

Setup

I am reporting a problem with Biopython version, Python version, and operating system as follows:

3.12.3 | packaged by conda-forge | (main, Apr 15 2024, 18:38:13) [GCC 12.3.0]
CPython
Linux-5.10.102.1-microsoft-standard-WSL2-x86_64-with-glibc2.35
1.83

Expected behaviour

The final error message — after re-raising a ValueError as a PDBIOException — is :

PDBIOException: Error when writing atom ('cif', 0, 'e', (' ', 609, ' '), ('CZ2', ' '))

While the error message of its source is:

ValueError: Atom serial number ('100000') exceeds PDB format limit.

So, contrary to the comment at PDBIO.save:L390 (see traceback), the "caught" exception message is LESS informative than the original one.

Actual behaviour

The traceback shows the original error message was more informative:

ValueError                                Traceback (most recent call last)
File ~/<...>/lib/python3.12/site-packages/Bio/PDB/PDBIO.py:379, in PDBIO.save(self, file, select, write_end, preserve_atom_numbering)
    378 try:
--> 379     s = get_atom_line(
    380         atom,
    381         hetfield,
    382         segid,
    383         atom_number,
    384         resname,
    385         resseq,
    386         icode,
    387         chain_id,
    388     )
    389 except Exception as err:
    390     # catch and re-raise with more information

File ~/<...>/lib/python3.12/site-packages/Bio/PDB/PDBIO.py:188, in PDBIO._get_atom_line(self, atom, hetfield, segid, atom_number, resname, resseq, icode, chain_id, charge)
    187 if atom_number > 99999:
--> 188     raise ValueError(
    189         f"Atom serial number ('{atom_number}') exceeds PDB format limit."
    190     )
    192 # Check if the element is valid, unknown (X), or blank

ValueError: Atom serial number ('100000') exceeds PDB format limit.

The above exception was the direct cause of the following exception:

PDBIOException                            Traceback (most recent call last)
Cell In[31], line 12
      9 io.set_structure(structure)
     10 with open(cif_out, "w") as fh:
     11     #try:
---> 12     io.save(fh)
     13     #except Exception as e:
     14     #    print(f"Could not save cif file as {cif_out}: {e}")

File ~/<...>/lib/python3.12/site-packages/Bio/PDB/PDBIO.py:391, in PDBIO.save(self, file, select, write_end, preserve_atom_numbering)
    379     s = get_atom_line(
    380         atom,
    381         hetfield,
   (...)
    387         chain_id,
    388     )
    389 except Exception as err:
    390     # catch and re-raise with more information
--> 391     raise PDBIOException(
    392         f"Error when writing atom {atom.full_id}"
    393     ) from err
    394 else:
    395     fhandle.write(s)

PDBIOException: Error when writing atom ('cif', 0, 'e', (' ', 609, ' '), ('CZ2', ' '))

Steps to reproduce

import Bio.PDB as PDB
parser = PDB.MMCIFParser(QUIET=True)

cif_file = Path("6kig-assembly1.cif") # downloaded from rcsb.org & unzipped
cif_out = Path("6kig.pdb")

structure = parser.get_structure("cif", cif_file)
io = PDB.PDBIO()
io.set_structure(structure)
with open(cif_out, "w") as fh:
    #try:    # I removed this in order to get the error trace
    io.save(fh)
    #except Exception as e:
    #    print(f"Could not save cif file as {cif_out}: {e}")

CatChenal avatar May 15 '24 20:05 CatChenal

You're getting both exceptions with an unmodified Biopython, right?

Rather than this:

raise PDBIOException(
    f"Error when writing atom {atom.full_id}"
) from err

would you prefer:

raise PDBIOException(
    f"Error when writing atom {atom.full_id}: {err}"
) from err

or with a single exception:

raise PDBIOException(
    f"Error when writing atom {atom.full_id}: {err}"
) from None

Or there may be another way to improve this. @etal or @JoaoRodrigues ?

peterjc avatar May 16 '24 09:05 peterjc

Thanks for responding so promptly, @peterjc.

Yes, the biopython library I use is unmodified.

Additional information I forgot to mention:

  • This exception does not abort the saving operation: the pdb file is created & the number of atoms is truncated up to the pdb format limit, silently.
  • Setting auth_residues=False in PDB.MMCIFParser suppresses the io.save errors, but yields the same outcome: a truncated pdb file is created.

Since the root cause of the error is made explicit in the initial ValueError message, i.e:

Atom serial number ('100000') exceeds PDB format limit.

It should not be lost. So yes, propagating its message into a PDBIOException would be an improvement. Better yet: the message could mention that the pdb file created will be truncated to the atoms number limit of the pdb format.

The following modification would improve the io.save method`:

Add a parameter: truncate_ok (bool, default: True), to io.save to indicate that this is the effective behavior, and to abort the saving operation if it is set to False.

Thank you.

CatChenal avatar May 16 '24 13:05 CatChenal

Thanks for reporting @CatChenal, I'll have a look this weekend. This doesn't seem great..

JoaoRodrigues avatar May 16 '24 13:05 JoaoRodrigues