GeneLab_Data_Processing icon indicating copy to clipboard operation
GeneLab_Data_Processing copied to clipboard

[Microarray] Assertion fails in GENERATE_SOFTWARE_TABLE when data files are not compressed

Open cyouh95 opened this issue 1 year ago • 1 comments

Description

R.utils is only used when data files are compressed (e.g., .CEL.gz) to unzip them. The following assertion fails with uncompressed data files (e.g., .CEL) because R.utils is not used:

https://github.com/nasa/GeneLab_Data_Processing/blob/90d6bb5d6a20d817fa17ac5cb0763d4f8f75966b/Microarray/Affymetrix/Workflow_Documentation/NF_MAAffymetrix/workflow_code/modules/GENERATE_SOFTWARE_TABLE/resources/usr/bin/SoftwareYamlToMarkdownTable.py#L57

Solution

Modify AFFYMETRIX_SOFTWARE_DPPD to exclude R.utils if data files are not compressed. Same thing can be done to AGILENT_SOFTWARE_DPPD in Agilent pipeline.

cyouh95 avatar Jun 12 '24 00:06 cyouh95

Array Data File Name field in runsheet used to determine whether data files are compressed or not. Quoted commas cause issue in splitCsv() as described here, but can be resolved by specifying quote parameter.

cyouh95 avatar Jul 02 '24 00:07 cyouh95