xqtl-protocol icon indicating copy to clipboard operation
xqtl-protocol copied to clipboard

Some reference data download link nolonger work, pending investigation.

Open hsun3163 opened this issue 2 years ago • 3 comments

(py3.11) [sunh14@lc03e22 ~]$ cd /sc/arion/projects/CommonMind/roussp01a/snmulti_QTL/working (py3.11) [sunh14@lc03e22 working]$ sos run pipeline/reference_data.ipynb download_hg_reference --cwd ../input/reference_data & [1] 189132 (py3.11) [sunh14@lc03e22 working]$ sos run pipeline/reference_data.ipynb download_gene_annotation --cwd ../input/reference_data & [2] 189133 (py3.11) [sunh14@lc03e22 working]$ sos run pipeline/reference_data.ipynb download_ercc_reference --cwd ../input/reference_data & [3] 189134 (py3.11) [sunh14@lc03e22 working]$ sos run pipeline/reference_data.ipynb download_dbsnp --cwd ../input/reference_data &INFO: Running download_hg_reference: INFO: Running download_ercc_reference: GRCh38_ful...lus_decoy_hla.fa: <urlopen error [Errno 101] Network is unreachable>: INFO: Running download_gene_annotation: ERROR: download_hg_reference (id=88880766584b8229) returns an error. Homo_sapie...8.103.chr.gtf.gz: 0%| | 0/49087092 [00:00<?, ?it/s] ERCC92.zip: 0%| | 0/28717 [00:00<?, ?it/s] INFO: download_ercc_reference is completed.
INFO: download_ercc_reference output: /sc/arion/projects/CommonMind/roussp01a/snmulti_QTL/input/reference_data/ERCC92.gtf /sc/arion/projects/CommonMind/roussp01a/snmulti_QTL/input/reference_data/ERCC92.fa Homo_sapie...8.103.chr.gtf.gz: 0%|▏ | 49152/49087092 [00:00<03:33, 229662.93it/s]INFO: Workflow download_ercc_reference (ID=w297010867a7f15c9) is executed successfully with 1 completed step.

[4] 189193 Homo_sapie...8.103.chr.gtf.gz: 0%|▍ | 172032/49087092 [00:00<01:56, 421643.95it/s] [3]- Done sos run pipeline/reference_data.ipynb download_ercc_reference --cwd ../input/reference_data Homo_sapie...8.103.chr.gtf.gz: 1%|█ | 360448/49087092 [00:00<01:39, 490429.95it/s]ERROR: [download_hg_reference]: [0]:

RuntimeError Traceback (most recent call last) script_8878139621259498696 in ----> download('ftp://ftp.1000genomes.ebi.ac.uk/vol1/ftp/technical/reference/GRCh38_reference_genome/GRCh38_full_analysis_set_plus_decoy_hla.fa\n\n', dest_dir = cwd)

RuntimeError: Failed to download {urls[0]} Homo_sapie...8.103.chr.gtf.gz: 1%|█▏ | 425984/49087092 [00:00<01:35, 507455.58it/s]INFO: Running download_dbsnp: 00-All.vcf.gz: <urlopen error [Errno 101] Network is unreachable>: 00-All.vcf.gz.tbi: <urlopen error [Errno 101] Network is unreachable>: ERROR: download_dbsnp (id=eb7f9a9839feca92) returns an error. Homo_sapie...8.103.chr.gtf.gz: 2%|██▊ | 1007616/49087092 [00:02<01:30, 532953.87it/s]ERROR: [download_dbsnp]: [0]:

RuntimeError Traceback (most recent call last) script_8177488568793545762 in ----> download('ftp://ftp.ncbi.nlm.nih.gov/snp/organisms/human_9606/VCF/00-All.vcf.gz\nftp://ftp.ncbi.nlm.nih.gov/snp/organisms/human_9606/VCF/00-All.vcf.gz.tbi\n\n', dest_dir = cwd)

RuntimeError: Failed to download ftp://ftp.ncbi.nlm.nih.gov/snp/organisms/human_9606/VCF/00-All.vcf.gz (2 out of 2) Homo_sapie...8.103.chr.gtf.gz: 31%|█████████████████████████████████████████▉ | 15007744/49087092 [00:28<01:05, 518354.50it/s]

hsun3163 avatar Dec 01 '23 15:12 hsun3163

These two command fails: sos run pipeline/reference_data.ipynb download_hg_reference --cwd ../input/reference_data sos run pipeline/reference_data.ipynb download_dbsnp --cwd ../input/reference_data

hsun3163 avatar Dec 01 '23 15:12 hsun3163

could be firewall blocking ftps.

hsun3163 avatar Dec 01 '23 15:12 hsun3163

The download_dbsnp should be due to different firewall setting in different nodes. The download_hg_reference is more strange as it can be wget but not download() via sos.

ERROR: download_hg_reference (id=88880766584b8229) returns an error.
00-All.vcf.gz.tbi: downloaded                                                   :
00-All.vcf.gzERROR: [download_hg_reference]: [0]:                               :
---------------------------------------------------------------------------
RuntimeError                              Traceback (most recent call last)
script_3183852603783812485 in <module>
----> download('ftp://ftp.1000genomes.ebi.ac.uk/vol1/ftp/technical/reference/GRCh38_reference_genome/GRCh38_full_analysis_set_plus_decoy_hla.fa\n\n', dest_dir = cwd)


RuntimeError: Failed to download {urls[0]}
00-All.vcf.gz(py3.11) [sunh14@dataxfer-10 working]$ ftp ftp://ftp.1000genomes.ebi.ac.uk/vol1/ftp/technical/reference/GRCh38_reference_genome/GRCh38_full_analysis_set_plus_decoy_hla.fa
-bash: ftp: command not found
(py3.11) [sunh14@dataxfer-10 working]$ wget ftp://ftp.1000genomes.ebi.ac.uk/vol1/ftp/technical/reference/GRCh38_reference_genome/GRCh38_full_analysis_set_plus_decoy_hla.fa
--2023-12-05 12:54:58--  ftp://ftp.1000genomes.ebi.ac.uk/vol1/ftp/technical/reference/GRCh38_reference_genome/GRCh38_full_analysis_set_plus_decoy_hla.fa
           => ‘GRCh38_full_analysis_set_plus_decoy_hla.fa’
Resolving ftp.1000genomes.ebi.ac.uk (ftp.1000genomes.ebi.ac.uk)... 193.62.193.167
Connecting to ftp.1000genomes.ebi.ac.uk (ftp.1000genomes.ebi.ac.uk)|193.62.193.167|:21... connected.
Logging in as anonymous ... Logged in!
==> SYST ... done.    ==> PWD ... done.
==> TYPE I ... done.  ==> CWD (1) /vol1/ftp/technical/reference/GRCh38_reference_genome ... done.
==> SIZE GRCh38_full_analysis_set_plus_decoy_hla.fa ... 3263683042
==> PASV ... done.    ==> RETR GRCh38_full_analysis_set_plus_decoy_hla.fa ... done.
Length: 3263683042 (3.0G) (unauthoritative)

14% [==============================>

hsun3163 avatar Dec 05 '23 17:12 hsun3163