PDBIOException re-raised ValueError with obfuscating message
Setup
I am reporting a problem with Biopython version, Python version, and operating system as follows:
3.12.3 | packaged by conda-forge | (main, Apr 15 2024, 18:38:13) [GCC 12.3.0]
CPython
Linux-5.10.102.1-microsoft-standard-WSL2-x86_64-with-glibc2.35
1.83
Expected behaviour
The final error message — after re-raising a ValueError as a PDBIOException — is :
PDBIOException: Error when writing atom ('cif', 0, 'e', (' ', 609, ' '), ('CZ2', ' '))
While the error message of its source is:
ValueError: Atom serial number ('100000') exceeds PDB format limit.
So, contrary to the comment at PDBIO.save:L390 (see traceback), the "caught" exception message is LESS informative than the original one.
Actual behaviour
The traceback shows the original error message was more informative:
ValueError Traceback (most recent call last)
File ~/<...>/lib/python3.12/site-packages/Bio/PDB/PDBIO.py:379, in PDBIO.save(self, file, select, write_end, preserve_atom_numbering)
378 try:
--> 379 s = get_atom_line(
380 atom,
381 hetfield,
382 segid,
383 atom_number,
384 resname,
385 resseq,
386 icode,
387 chain_id,
388 )
389 except Exception as err:
390 # catch and re-raise with more information
File ~/<...>/lib/python3.12/site-packages/Bio/PDB/PDBIO.py:188, in PDBIO._get_atom_line(self, atom, hetfield, segid, atom_number, resname, resseq, icode, chain_id, charge)
187 if atom_number > 99999:
--> 188 raise ValueError(
189 f"Atom serial number ('{atom_number}') exceeds PDB format limit."
190 )
192 # Check if the element is valid, unknown (X), or blank
ValueError: Atom serial number ('100000') exceeds PDB format limit.
The above exception was the direct cause of the following exception:
PDBIOException Traceback (most recent call last)
Cell In[31], line 12
9 io.set_structure(structure)
10 with open(cif_out, "w") as fh:
11 #try:
---> 12 io.save(fh)
13 #except Exception as e:
14 # print(f"Could not save cif file as {cif_out}: {e}")
File ~/<...>/lib/python3.12/site-packages/Bio/PDB/PDBIO.py:391, in PDBIO.save(self, file, select, write_end, preserve_atom_numbering)
379 s = get_atom_line(
380 atom,
381 hetfield,
(...)
387 chain_id,
388 )
389 except Exception as err:
390 # catch and re-raise with more information
--> 391 raise PDBIOException(
392 f"Error when writing atom {atom.full_id}"
393 ) from err
394 else:
395 fhandle.write(s)
PDBIOException: Error when writing atom ('cif', 0, 'e', (' ', 609, ' '), ('CZ2', ' '))
Steps to reproduce
import Bio.PDB as PDB
parser = PDB.MMCIFParser(QUIET=True)
cif_file = Path("6kig-assembly1.cif") # downloaded from rcsb.org & unzipped
cif_out = Path("6kig.pdb")
structure = parser.get_structure("cif", cif_file)
io = PDB.PDBIO()
io.set_structure(structure)
with open(cif_out, "w") as fh:
#try: # I removed this in order to get the error trace
io.save(fh)
#except Exception as e:
# print(f"Could not save cif file as {cif_out}: {e}")
You're getting both exceptions with an unmodified Biopython, right?
Rather than this:
raise PDBIOException(
f"Error when writing atom {atom.full_id}"
) from err
would you prefer:
raise PDBIOException(
f"Error when writing atom {atom.full_id}: {err}"
) from err
or with a single exception:
raise PDBIOException(
f"Error when writing atom {atom.full_id}: {err}"
) from None
Or there may be another way to improve this. @etal or @JoaoRodrigues ?
Thanks for responding so promptly, @peterjc.
Yes, the biopython library I use is unmodified.
Additional information I forgot to mention:
- This exception does not abort the saving operation: the pdb file is created & the number of atoms is truncated up to the pdb format limit, silently.
- Setting
auth_residues=FalseinPDB.MMCIFParsersuppresses theio.saveerrors, but yields the same outcome: a truncated pdb file is created.
Since the root cause of the error is made explicit in the initial ValueError message, i.e:
Atom serial number ('100000') exceeds PDB format limit.
It should not be lost. So yes, propagating its message into a PDBIOException would be an improvement. Better yet: the message could mention that the pdb file created will be truncated to the atoms number limit of the pdb format.
The following modification would improve the io.save method`:
Add a parameter: truncate_ok (bool, default: True), to io.save to indicate that this is the effective behavior, and to abort the saving operation if it is set to False.
Thank you.
Thanks for reporting @CatChenal, I'll have a look this weekend. This doesn't seem great..