datasets icon indicating copy to clipboard operation
datasets copied to clipboard

Add an `--include all` option to `datasets download genome`

Open dtdoering opened this issue 1 year ago • 2 comments

Is your feature request related to a problem? Please describe.

In my workflow, I frequently want to get the latest genome/annotation files for a number of RefSeq (GCF_*) and GenBank (GCA_*) genomes to do some further analyses. However, it can be hard to remember the exact spelling/terms used for each of the options, particularly when each desired file has to be listed.

Describe the solution you'd like

As a QoL feature, I'd like to be able to save some keystrokes by typing e.g.:

datasets download genome accession GCA_005981935.1 --include all

instead of:

datasets download genome accession GCA_005981935.1 --include genome,protein,cds,gff3,gbff,seq-report

So that the only thing I need to remember or copy/paste is the accession, instead of the accession and then the files listing.

Thanks!

dtdoering avatar Jun 10 '24 20:06 dtdoering

Hi @dtdoering,

Thanks for opening this issue. We will consider adding this feature in a future release.

Best, Eric

Eric Cox, PhD [Contractor] (he/him/his) NCBI Datasets NIH/NLM/NCBI [email protected]

ericcox1 avatar Jun 11 '24 12:06 ericcox1

Adding another reason -- since many GenBank bacterial genomes only have annotations in GenBank format (and no GFF), the --include all option would be very useful when used with the --preview option, so that one can see which files are even available for a given genome before deciding whether to download it or choose a different one.

That said, thanks for the info! Would love to see this added in a future release (or take a stab at a PR for it myself, pending #229)!

dtdoering avatar Jun 12 '24 19:06 dtdoering