GeneLab_Data_Processing icon indicating copy to clipboard operation
GeneLab_Data_Processing copied to clipboard

[BulkRNASeq] Handling Technical Replicates

Open J-81 opened this issue 2 years ago • 2 comments

Description

Workflow should handle technical replicates appropriately.

Approaches

DESeq2 provides a collapseReplicates function that sums counts based on a factor to group samples by. The rationale has two major points:

  1. Summing opposed to averaging is appropriate for maintaining expected Poisson distribution
  2. DESeq2 is designed to normalize for library size differences. Summing technical replicates is akin to having a higher sequencing depth for a sample.

Implementation Suggested

Encode Technical Replicate Groups in the Runsheet

Encode technical replicates as a column in the runsheet simply using integers for each technical replicate group. Eventually, this technical replicate column should be automatically derived from ISA archive metadata; however, in the meantime, a workflow user should be able to supply a two column csv mapping sample name to technical replicate group which will be incorporated into the runsheet.

Use Technical Replicate Groups Column in Runsheet to for DESeq2 collapseReplicates

https://rdrr.io/bioc/DESeq2/man/collapseReplicates.html

Validation Plan

  1. Validate reasonable approach results as follows:

Run the following approaches

  • NF_RCP-F_1.0.3 (i.e. no technical replicate handling)
  • collapseReplicates (summed tech. replicates)
  • median replicates
  • mean replicates
  • filter to first replicate only (drop others)

Assessment Metrics:

  • DGE results
  1. Regression Test Criteria
  • Core tests should run without change in outcomes (since core tests don't include any technical replicates)

J-81 avatar Jun 10 '23 02:06 J-81

Implementation Steps

  • [ ] Runsheet generation now ingests optional technical replicate group table
  • [ ] DESeq2 script updated to use tech. rep group data for four handling approaches

J-81 avatar Jun 10 '23 02:06 J-81

Additional considerations:

  • How to handle technical replicates on multiple levels (e.g. multiple tissue cuts and multiple library preps for same biological sample)
  • Group statistic and sample count
    • Perform after or before collapsing replicates?
    • Group stats related code does require reworking as currently written after collapsing replicates

J-81 avatar Jun 14 '23 19:06 J-81