Low nhet and lcn.em NA issues
Hi, I'm analyzing a few old TCGA WXS data (done by NimbleGen/hg18 and VCrome/hg19), and all 8 samples have extremely low nhet (0 or 1) and lcn.em is NA. We were suspicious that it might be coverage problem, and seems that the coverage are all 150-200X, so this is ruled out. Any suggestions what might cause this problem? 4 samples are Breast cancer samples and the other 4 are colorectal cancer samples, which are not likely to have extremely low SNVs. The cluster number is ~30 so it should not be over-fragmented either (just read issue ticket#39). FYI, for other samples using SureSelect/GRCh37 capture kit, everything is fine. Thanks!
If the coverage for those samples is indeed 150x then you should get around 250k SNPs in the analysis of which 8-10% will be hets. You can check this by looking at the numbers sum(oo$jointseg$het) and nrow(oo$jointseg) where oo is the name of the object procSample returns. Seems like you don't have a hyperfragmented sample either. If you can provide the TCGA ID of the samples I can check if we have the results of those at our end.
Venkat
following is one of the tumor examples: 8e5f741c-996c-4b44-84c4-c9e9e5529944/TCGA-E2-A15A-01A-11D-A12B-09_IlluminaGA-DNASeq_exome_gdc_realn.bam Normal control is: a2d7ab5a-935c-4b96-bf38-1891fa437922/TCGA-E2-A15A-10A-01D-A12B-09_IlluminaGA-DNASeq_exome_gdc_realn.bam
And the coverage for tumor is 259X and normal is 287X. Both are WXS samples.
I'm not sure how to check sum(oo$jointseg$het) and nrow(oo$jointseg), and are those metrics included in one of the data files resulting from FACETS analysis? I checked the procSample-jseg file for this sample and there are 8856 segments, yet het for all segments is 0. I can send you the file if that helps. Let me know.
Any time you have more than 300-400 segments the sample is hyperfragmented. Yours with 8856 is certainly. So try increasing cval to see if it helps.
From the vignette the steps for running facets are:
rcmat = readSnpMatrix(datafile) xx = preProcSample(rcmat) oo=procSample(xx,cval=150)
So you can issue sum(oo$jointseg$het) and nrow(oo$jointseg) in R command line right after.
Venkat
I'm not sure whether the segments are consistent through the files. In the FACETS_heterogeneity_cncf_EM file the segment number is 33. procSample-jseg file has 8856 objects, which I assume are segments as well? Just want to clarify and make sure. Thanks!
Hyperfragmentation should be based on segmentation only and hence prior to EM. Multiple segments that look similar are grouped together into clusters.
No cluster was grouped in this case as reported in the title. lcn.em is all NA. That's the issue we want to solve...
Hyperfragmented samples are a bad starting point. Nothing can be done to get reasonable results from them.
Is the number of segments given by nrow(oo$jointseg)?
No. That is the number of loci used in the analysis. The number of segments is nrow(oo$out) for the procSample output or nrow(oo$cncf) for the emcncf output.