genomic-features icon indicating copy to clipboard operation
genomic-features copied to clipboard

Defining the main genomic coordinates for each table

Open emdann opened this issue 2 years ago • 2 comments

Description of feature

For each table available via EnsemblDb (e.g. genes, promoters) we could rename the primary genomic coordinates to the format required by bioframe (e.g. chrom, start, end). This would (a) mimic the behaviour in bioconductor (where ranges make the IRange column) and (b) allow selecting columns with additional coordinates in the ibis query (e.g. add tx_seq_start column when running genes #9 ).

emdann avatar Apr 28 '23 12:04 emdann

You can be flexible with bioframe using the cols arguments.

bioframe.overlap(genes, df2, cols1=["seq_name", "gene_seq_start", "gene_seq_end"])

nvictus avatar Apr 03 '24 20:04 nvictus

It would be nice if you didn't have to though.

ensembldb and GenomicFeatures both return a GRanges which contains seqnames, ranges, and strand, for the entity being queries (e.g. genes, transcripts). This is nice because it can be directly used with genomic range libraries.

We could also provide (maybe with opt-out) the main entity's features in a way that will automatically work with bioframe.

ivirshup avatar Apr 04 '24 11:04 ivirshup