GenomeInfoDb icon indicating copy to clipboard operation
GenomeInfoDb copied to clipboard

seqlevelsStyle -- supporting offline use?

Open vjcitn opened this issue 2 years ago • 1 comments

when setting seqlevelsStyle to "UCSC" it appears a network query is inevitably issued, leading to failure if off line.

can we use BiocFileCache to hold the relevant information persistently?

would a PR be considered?

vjcitn avatar Sep 21 '23 14:09 vjcitn

See issue #26 for a discussion about this. TLDR: One concern is that there's a (small) risk that the cache data become stale after the online NCBI or UCSC data changes. A rare event but it happens sometimes. This could be mitigated by having some sort of cache expiration mechanism.

But before doing that, an improvement that is on my TODO list is to make seqlevelsStyle(x) <- "UCSC" work offline, and without the need for any caching, when seqinfo(x) only contains assembled molecules (i.e. chromosomes + mitochondrial DNA) and no scaffolds. This would probably cover most use cases. This feature would take advantage of data that is included in the package: https://github.com/Bioconductor/GenomeInfoDb/tree/devel/inst/extdata/assembled_molecules_db/UCSC Unlike the full sequence info, the sequence info restricted to assembled molecules is small and very stable so it makes sense to include it in the package, at least for the most commonly used genomes.

hpages avatar Sep 21 '23 16:09 hpages