[BulkRNASeq] Handling Technical Replicates
Description
Workflow should handle technical replicates appropriately.
Approaches
DESeq2 provides a collapseReplicates function that sums counts based on a factor to group samples by. The rationale has two major points:
- Summing opposed to averaging is appropriate for maintaining expected Poisson distribution
- DESeq2 is designed to normalize for library size differences. Summing technical replicates is akin to having a higher sequencing depth for a sample.
Implementation Suggested
Encode Technical Replicate Groups in the Runsheet
Encode technical replicates as a column in the runsheet simply using integers for each technical replicate group. Eventually, this technical replicate column should be automatically derived from ISA archive metadata; however, in the meantime, a workflow user should be able to supply a two column csv mapping sample name to technical replicate group which will be incorporated into the runsheet.
Use Technical Replicate Groups Column in Runsheet to for DESeq2 collapseReplicates
https://rdrr.io/bioc/DESeq2/man/collapseReplicates.html
Validation Plan
- Validate reasonable approach results as follows:
Run the following approaches
- NF_RCP-F_1.0.3 (i.e. no technical replicate handling)
- collapseReplicates (summed tech. replicates)
- median replicates
- mean replicates
- filter to first replicate only (drop others)
Assessment Metrics:
- DGE results
- Regression Test Criteria
- Core tests should run without change in outcomes (since core tests don't include any technical replicates)
Implementation Steps
- [ ] Runsheet generation now ingests optional technical replicate group table
- [ ] DESeq2 script updated to use tech. rep group data for four handling approaches
Additional considerations:
- How to handle technical replicates on multiple levels (e.g. multiple tissue cuts and multiple library preps for same biological sample)
- Group statistic and sample count
- Perform after or before collapsing replicates?
- Group stats related code does require reworking as currently written after collapsing replicates