Add an `--include all` option to `datasets download genome`
Is your feature request related to a problem? Please describe.
In my workflow, I frequently want to get the latest genome/annotation files for a number of RefSeq (GCF_*) and GenBank (GCA_*) genomes to do some further analyses. However, it can be hard to remember the exact spelling/terms used for each of the options, particularly when each desired file has to be listed.
Describe the solution you'd like
As a QoL feature, I'd like to be able to save some keystrokes by typing e.g.:
datasets download genome accession GCA_005981935.1 --include all
instead of:
datasets download genome accession GCA_005981935.1 --include genome,protein,cds,gff3,gbff,seq-report
So that the only thing I need to remember or copy/paste is the accession, instead of the accession and then the files listing.
Thanks!
Hi @dtdoering,
Thanks for opening this issue. We will consider adding this feature in a future release.
Best, Eric
Eric Cox, PhD [Contractor] (he/him/his) NCBI Datasets NIH/NLM/NCBI [email protected]
Adding another reason -- since many GenBank bacterial genomes only have annotations in GenBank format (and no GFF), the --include all option would be very useful when used with the --preview option, so that one can see which files are even available for a given genome before deciding whether to download it or choose a different one.
That said, thanks for the info! Would love to see this added in a future release (or take a stab at a PR for it myself, pending #229)!