Conversion of seqlevel styles
Description of feature
Add functionality that allows translation between different chromosome sequence naming conventions (e.g., "chr1" versus "1").
This could be similar to the seqlevelsStyle function in the R package GenomeInfoDb :
seqlevelsStyle(gr_obj) = "UCSC"
In bioframe, we started doing this by providing an alias dictionary that maps all variants (including genbank IDs) to a single canonical name. Keeping track of naming "styles" for each provider and each species gets unwieldy, especially when ancillary scaffolds are considered (unlocalized, unplaced, alt).
https://bioframe.readthedocs.io/en/latest/guide-io.html#curated-genome-assembly-build-information
@nvictus, you investigated this a bunch during the hackathon. It sounded like we ended up at:
GenomeInfoDb probably has the info we want, but doesn't really make it accessible
Right?
What did GenomeInfoDb provide that bioframe doesn't? I would imagine you've covered some of the most common cases already.
ensembldb lets the user set the seqlevelsstyle like this: seqlevelsStyle(edb) <- "UCSC". Maybe we could do something similar via bioframe's assembly info?
EnsemblDB(connection, seq_style=bioframe.assembly_info(...))