eager
eager copied to clipboard
DSL2: Add test profile that forces ALL the possible merging steps for FastQ and BAM inputs, to ensure no file name collisions happen.
Merges happen at these levels:
- pairness (BAMs are always SE, so that should be fine.)
- lane
- udg
- library_id
Current status for naming conventions allowed:
Checked in eager/subworkflows/local/utils_nfcore_eager_pipeline/main.nf
- No mixed strandedness for the same sample across different libraries is allowed. (eg all data for sampleA must be single stranded, or double stranded library preparations)
- Multiple PE libraries must have unique name.
NEW:
- Single LibraryID is only allowed to have a single type of UDG treatment
- removed filecollision/overwriting of finalbam/raw if raw set as genotyping input (the process gets repeated if run_mapdamage_rescaling, run_pmd_filtering, or run_trim_bam are set), this way keeps only the initial LIBRARY_MERGE subworkflow edition.
Todos:
- [ ] : decide upon behavior for manipulate damage - currently it can be activated but never used downstream or saved. ideally it automaticaly saves the rescaled/trimmed/maniuplated sample bams post merging. (possibly move merge_library_genotyping to the subworkflow of MANIPULATE_DAMAGE for clarity?) + allow for multiple output saving per tool for this post-damage merging.
- [ ] check possible genotyping file collision points
- [ ] generate test dataset with as many weird input combos as possible to find file collisions.