genomic-features icon indicating copy to clipboard operation
genomic-features copied to clipboard

Conversion of seqlevel styles

Open lauradmartens opened this issue 2 years ago • 2 comments

Description of feature

Add functionality that allows translation between different chromosome sequence naming conventions (e.g., "chr1" versus "1").

This could be similar to the seqlevelsStyle function in the R package GenomeInfoDb :

seqlevelsStyle(gr_obj) = "UCSC"

lauradmartens avatar Apr 26 '23 14:04 lauradmartens

In bioframe, we started doing this by providing an alias dictionary that maps all variants (including genbank IDs) to a single canonical name. Keeping track of naming "styles" for each provider and each species gets unwieldy, especially when ancillary scaffolds are considered (unlocalized, unplaced, alt).

https://bioframe.readthedocs.io/en/latest/guide-io.html#curated-genome-assembly-build-information

nvictus avatar Apr 03 '24 19:04 nvictus

@nvictus, you investigated this a bunch during the hackathon. It sounded like we ended up at:

GenomeInfoDb probably has the info we want, but doesn't really make it accessible

Right?

What did GenomeInfoDb provide that bioframe doesn't? I would imagine you've covered some of the most common cases already.

ensembldb lets the user set the seqlevelsstyle like this: seqlevelsStyle(edb) <- "UCSC". Maybe we could do something similar via bioframe's assembly info?

EnsemblDB(connection, seq_style=bioframe.assembly_info(...))

ivirshup avatar Apr 05 '24 16:04 ivirshup