biopython icon indicating copy to clipboard operation
biopython copied to clipboard

SeqFeature cleanup

Open mdehoon opened this issue 3 years ago • 2 comments

I am finding code to parse GenBank-style location strings into SeqFeature objects in multiple places in Biopython:

  • In Bio/SwissProt/__init__.py, in the _read_ft function;
  • In Bio/SeqIO/SwissIO.py, in the _make_position function;
  • In Bio/GenBank/__init__.py, in the _pos function.

Note that all three functions are private.

It would be better to replace these three functions by one public function, maybe SeqFeature.fromstring.

mdehoon avatar Aug 29 '22 00:08 mdehoon

attn @peterjc

mdehoon avatar Aug 29 '22 00:08 mdehoon

Looks like _make_pos in Bio/SeqIO/SwissIO.py is unused, probably an oversight from https://github.com/biopython/biopython/pull/2484 - it can probably be removed now.

As to SwissProt vs GenBank/EMBL, the formats are overlapping but not exactly the same. The SwissProt locations are far simpler (but can be just "?") as I recall.

I had previously thought about pulling the GenBenk/EMBL location parsing into a user-facing function as you are suggesting.

peterjc avatar Aug 29 '22 08:08 peterjc