Snakemake workflow fails on completion with temporary files
Describe the Bug
Running a Snakemake workflow with temporary output files makes the whole run fail. Reason being that temporary files are deleted by Snakemake as the workflow is running so when the s3 copy step happens those files are no longer there.
Steps to Reproduce
- Run a snakemake workflow that has multiple jobs with
tempoutputs.
Relevant Logs
For example see this output for a workflow I am running:
Wed, 11 May 2022 09:28:39 -0700 unlocking
Wed, 11 May 2022 09:28:39 -0700 removing lock
Wed, 11 May 2022 09:28:39 -0700 removing lock
Wed, 11 May 2022 09:28:39 -0700 removed all locks
Wed, 11 May 2022 09:28:39 -0700 Snakmake outputs are:
Wed, 11 May 2022 09:28:40 -0700 Building DAG of jobs...
Wed, 11 May 2022 09:28:41 -0700 Updating job all.
Wed, 11 May 2022 09:28:44 -0700 output_file date rule version log-file(s) status plan
Wed, 11 May 2022 09:28:44 -0700 classified/strain.tsv Wed May 11 16:28:29 2022 tabulate - ok no update
Wed, 11 May 2022 09:28:44 -0700 classified/species.tsv Wed May 11 16:28:29 2022 tabulate - ok no update
Wed, 11 May 2022 09:28:44 -0700 classified/genus.tsv Wed May 11 16:28:29 2022 tabulate - ok no update
Wed, 11 May 2022 09:28:44 -0700 classified/family.tsv Wed May 11 16:28:29 2022 tabulate - ok no update
Wed, 11 May 2022 09:28:44 -0700 qc/Exp01DRfullBTxA_GAACTGAGCG-CGCTCCACGA_L001_R1_001.fastq.gz - qc - logs/qc/Exp01DRfullBTxA_GAACTGAGCG-CGCTCCACGA_L001 missing no update
Wed, 11 May 2022 09:28:44 -0700 qc/Exp01DRfullBTxA_GAACTGAGCG-CGCTCCACGA_L001_R2_001.fastq.gz - qc - logs/qc/Exp01DRfullBTxA_GAACTGAGCG-CGCTCCACGA_L001 missing no update
[...]
Wed, 11 May 2022 09:28:45 -0700 copying outputs efs with s3
Wed, 11 May 2022 09:28:45 -0700 attempt 1 at copying classified/strain.tsv to s3://[redacted]/strain.tsv
upload: classified/strain.tsv to s3://[redacted]/strain.tsv
Wed, 11 May 2022 09:28:46 -0700 attempt 1 at copying classified/species.tsv to s3://[redacted]/species.tsv
upload: classified/species.tsv to s3://[redacted]/species.tsv
Wed, 11 May 2022 09:28:47 -0700 attempt 1 at copying classified/genus.tsv to s3://[redacted]/genus.tsv
upload: classified/genus.tsv to s3://[redacted]/genus.tsv
Wed, 11 May 2022 09:28:48 -0700 attempt 1 at copying classified/family.tsv to s3://[recacted]/family.tsv
upload: classified/family.tsv to s3://[redacted]/family.tsv
Wed, 11 May 2022 09:28:49 -0700 attempt 1 at copying qc/Exp01DRfullBTxA_GAACTGAGCG-CGCTCCACGA_L001_R1_001.fastq.gz to s3://[redacted]/Exp01DRfullBTxA_GAACTGAGCG-CGCTCCACGA_L001_R1_001.fastq.gz
Wed, 11 May 2022 09:28:49 -0700 The user-provided path qc/Exp01DRfullBTxA_GAACTGAGCG-CGCTCCACGA_L001_R1_001.fastq.gz does not exist.
Wed, 11 May 2022 09:28:49 -0700 === Running Cleanup ===
Wed, 11 May 2022 09:28:49 -0700 === Bye! ===
In my Snakemake file I have a rule that marks qc outputs as temporary, for example:
output:
qcfwd=temp("qc/{sample}_R1_001.fastq.gz"),
qcrev=temp("qc/{sample}_R2_001.fastq.gz"),
As the workflow is running Snakemake reports that these files are deleted, for example:
Wed, 11 May 2022 08:49:49 -0700 Removing temporary output file qc/Exp01DRfullBTxA_GAACTGAGCG-CGCTCCACGA_L001_R1_001.fastq.gz.
Expected Behavior
The copying step should skip copying temporary files. In the context of snakemake those are files that we can "live without". Plus I wouldn't want to copy these intermediate files back to my S3 bucket.
Actual Behavior
snakemake.aws.sh tries to copy temporary files and fails when it can't find them.
Screenshots
Additional Context
Operating System: AGC Version: 1.4.0 Was AGC setup with a custom bucket: no Was AGC setup with a custom VPC: no