download https://resources.aertslab.org/cistarget/
I would like to download all files here: https://resources.aertslab.org/cistarget/
Is there a fast way to download all the contents with the appropriate structures without having to use zsync for individual files?
To download everything (except all database feather files) with directory structure, you can use:
wget --recursive --timestamping --no-parent -R '*.feather,*.zsync' https://resources.aertslab.org/cistarget/
Downloading all Feather files by default is not recommended as there are old Feather v1 databases and other databases that you probably don't necessarily need. The full resources are > 600GB.
To download a specific subset of databases, first list all directories.
List all directories:
❯ find resources.aertslab.org/ -type d
resources.aertslab.org/
resources.aertslab.org/cistarget
resources.aertslab.org/cistarget/databases
resources.aertslab.org/cistarget/databases/mus_musculus
resources.aertslab.org/cistarget/databases/mus_musculus/mm9
resources.aertslab.org/cistarget/databases/mus_musculus/mm9/refseq_r45
resources.aertslab.org/cistarget/databases/mus_musculus/mm9/refseq_r45/mc9nr
resources.aertslab.org/cistarget/databases/mus_musculus/mm9/refseq_r70
resources.aertslab.org/cistarget/databases/mus_musculus/mm9/refseq_r70/mc9nr
resources.aertslab.org/cistarget/databases/mus_musculus/mm10
resources.aertslab.org/cistarget/databases/mus_musculus/mm10/refseq_r80
resources.aertslab.org/cistarget/databases/mus_musculus/mm10/refseq_r80/mc9nr
resources.aertslab.org/cistarget/databases/mus_musculus/mm10/refseq_r80/mc_v10_clust
resources.aertslab.org/cistarget/databases/mus_musculus/mm10/screen
resources.aertslab.org/cistarget/databases/mus_musculus/mm10/screen/mc_v10_clust
resources.aertslab.org/cistarget/databases/drosophila_melanogaster
resources.aertslab.org/cistarget/databases/drosophila_melanogaster/dm3
resources.aertslab.org/cistarget/databases/drosophila_melanogaster/dm3/flybase_r5.37
resources.aertslab.org/cistarget/databases/drosophila_melanogaster/dm3/flybase_r5.37/mc9nr
resources.aertslab.org/cistarget/databases/drosophila_melanogaster/dm6
resources.aertslab.org/cistarget/databases/drosophila_melanogaster/dm6/flybase_r6.02
resources.aertslab.org/cistarget/databases/drosophila_melanogaster/dm6/flybase_r6.02/mc8nr
resources.aertslab.org/cistarget/databases/drosophila_melanogaster/dm6/flybase_r6.02/tc_v1
resources.aertslab.org/cistarget/databases/drosophila_melanogaster/dm6/flybase_r6.02/mc9nr
resources.aertslab.org/cistarget/databases/drosophila_melanogaster/dm6/flybase_r6.02/mc_v10_clust
resources.aertslab.org/cistarget/databases/old
resources.aertslab.org/cistarget/databases/old/mus_musculus
resources.aertslab.org/cistarget/databases/old/mus_musculus/mm9
resources.aertslab.org/cistarget/databases/old/mus_musculus/mm9/refseq_r45
resources.aertslab.org/cistarget/databases/old/mus_musculus/mm9/refseq_r70
resources.aertslab.org/cistarget/databases/old/mus_musculus/mm10
resources.aertslab.org/cistarget/databases/old/mus_musculus/mm10/refseq_r80
resources.aertslab.org/cistarget/databases/old/drosophila_melanogaster
resources.aertslab.org/cistarget/databases/old/drosophila_melanogaster/dm3
resources.aertslab.org/cistarget/databases/old/drosophila_melanogaster/dm3/flybase_r5.37
resources.aertslab.org/cistarget/databases/old/drosophila_melanogaster/dm6
resources.aertslab.org/cistarget/databases/old/drosophila_melanogaster/dm6/flybase_r6.02
resources.aertslab.org/cistarget/databases/old/homo_sapiens
resources.aertslab.org/cistarget/databases/old/homo_sapiens/hg19
resources.aertslab.org/cistarget/databases/old/homo_sapiens/hg19/refseq_r45
resources.aertslab.org/cistarget/databases/old/homo_sapiens/hg38
resources.aertslab.org/cistarget/databases/old/homo_sapiens/hg38/refseq_r80
resources.aertslab.org/cistarget/databases/homo_sapiens
resources.aertslab.org/cistarget/databases/homo_sapiens/hg19
resources.aertslab.org/cistarget/databases/homo_sapiens/hg19/refseq_r45
resources.aertslab.org/cistarget/databases/homo_sapiens/hg19/refseq_r45/tc_v1
resources.aertslab.org/cistarget/databases/homo_sapiens/hg19/refseq_r45/mc9nr
resources.aertslab.org/cistarget/databases/homo_sapiens/hg38
resources.aertslab.org/cistarget/databases/homo_sapiens/hg38/refseq_r80
resources.aertslab.org/cistarget/databases/homo_sapiens/hg38/refseq_r80/tc_v1
resources.aertslab.org/cistarget/databases/homo_sapiens/hg38/refseq_r80/mc9nr
resources.aertslab.org/cistarget/databases/homo_sapiens/hg38/refseq_r80/mc_v10_clust
resources.aertslab.org/cistarget/databases/homo_sapiens/hg38/refseq_r80/mc_v10_clust/gene_based
resources.aertslab.org/cistarget/databases/homo_sapiens/hg38/screen
resources.aertslab.org/cistarget/databases/homo_sapiens/hg38/screen/mc_v10_clust
resources.aertslab.org/cistarget/tf_lists
resources.aertslab.org/cistarget/regions
resources.aertslab.org/cistarget/track2tf
resources.aertslab.org/cistarget/programs
resources.aertslab.org/cistarget/motif2tf
resources.aertslab.org/cistarget/motif_collections
resources.aertslab.org/cistarget/motif_collections/v10nr_clust_public
resources.aertslab.org/cistarget/motif_collections/v10nr_clust_public/singletons
resources.aertslab.org/cistarget/motif_collections/v10nr_clust_public/snapshots
resources.aertslab.org/cistarget/motif_collections/v10nr_clust_public/logos
Then construct the wget command to only download that subset:
# For e.g. resources.aertslab.org/cistarget/databases/homo_sapiens/hg38/refseq_r80/mc_v10_clust/
wget --recursive --timestamping --no-parent -R '*.zsync' https://resources.aertslab.org/cistarget/databases/homo_sapiens/hg38/refseq_r80/mc_v10_clust/
@ghuls
Thank you for the detailed reply. Where are you getting "> 600GB" from? I saw file that was nearly 100GB, another 33GB from the human folder