eager icon indicating copy to clipboard operation
eager copied to clipboard

DSL2: Add test profile that forces ALL the possible merging steps for FastQ and BAM inputs, to ensure no file name collisions happen.

Open TCLamnidis opened this issue 2 years ago • 1 comments

Merges happen at these levels:

  • pairness (BAMs are always SE, so that should be fine.)
  • lane
  • udg
  • library_id

TCLamnidis avatar May 25 '23 12:05 TCLamnidis

Current status for naming conventions allowed:

Checked in eager/subworkflows/local/utils_nfcore_eager_pipeline/main.nf

  • No mixed strandedness for the same sample across different libraries is allowed. (eg all data for sampleA must be single stranded, or double stranded library preparations)
  • Multiple PE libraries must have unique name.

NEW:

  • Single LibraryID is only allowed to have a single type of UDG treatment
  • removed filecollision/overwriting of finalbam/raw if raw set as genotyping input (the process gets repeated if run_mapdamage_rescaling, run_pmd_filtering, or run_trim_bam are set), this way keeps only the initial LIBRARY_MERGE subworkflow edition.

Todos:

  • [ ] : decide upon behavior for manipulate damage - currently it can be activated but never used downstream or saved. ideally it automaticaly saves the rescaled/trimmed/maniuplated sample bams post merging. (possibly move merge_library_genotyping to the subworkflow of MANIPULATE_DAMAGE for clarity?) + allow for multiple output saving per tool for this post-damage merging.
  • [ ] check possible genotyping file collision points
  • [ ] generate test dataset with as many weird input combos as possible to find file collisions.

ilight1542 avatar Mar 24 '25 15:03 ilight1542