facets Low nhet and lcn.em NA issues

Hi, I'm analyzing a few old TCGA WXS data (done by NimbleGen/hg18 and VCrome/hg19), and all 8 samples have extremely low nhet (0 or 1) and lcn.em is NA. We were suspicious that it might be coverage problem, and seems that the coverage are all 150-200X, so this is ruled out. Any suggestions what might cause this problem? 4 samples are Breast cancer samples and the other 4 are colorectal cancer samples, which are not likely to have extremely low SNVs. The cluster number is ~30 so it should not be over-fragmented either (just read issue ticket#39). FYI, for other samples using SureSelect/GRCh37 capture kit, everything is fine. Thanks!

Sep 21 '17 17:09 fangxiaolan

If the coverage for those samples is indeed 150x then you should get around 250k SNPs in the analysis of which 8-10% will be hets. You can check this by looking at the numbers sum(oo$jointseg$het) and nrow(oo$jointseg) where oo is the name of the object procSample returns. Seems like you don't have a hyperfragmented sample either. If you can provide the TCGA ID of the samples I can check if we have the results of those at our end.

Venkat

Sep 22 '17 13:09 veseshan

following is one of the tumor examples: 8e5f741c-996c-4b44-84c4-c9e9e5529944/TCGA-E2-A15A-01A-11D-A12B-09_IlluminaGA-DNASeq_exome_gdc_realn.bam Normal control is: a2d7ab5a-935c-4b96-bf38-1891fa437922/TCGA-E2-A15A-10A-01D-A12B-09_IlluminaGA-DNASeq_exome_gdc_realn.bam

And the coverage for tumor is 259X and normal is 287X. Both are WXS samples.

Sep 22 '17 14:09 fangxiaolan

I'm not sure how to check sum(oo$jointseg$het) and nrow(oo$jointseg), and are those metrics included in one of the data files resulting from FACETS analysis? I checked the procSample-jseg file for this sample and there are 8856 segments, yet het for all segments is 0. I can send you the file if that helps. Let me know.

Sep 22 '17 14:09 fangxiaolan

Any time you have more than 300-400 segments the sample is hyperfragmented. Yours with 8856 is certainly. So try increasing cval to see if it helps.

From the vignette the steps for running facets are:

rcmat = readSnpMatrix(datafile) xx = preProcSample(rcmat) oo=procSample(xx,cval=150)

So you can issue sum(oo$jointseg$het) and nrow(oo$jointseg) in R command line right after.

Venkat

Sep 22 '17 15:09 veseshan

I'm not sure whether the segments are consistent through the files. In the FACETS_heterogeneity_cncf_EM file the segment number is 33. procSample-jseg file has 8856 objects, which I assume are segments as well? Just want to clarify and make sure. Thanks!

Sep 22 '17 15:09 fangxiaolan

Hyperfragmentation should be based on segmentation only and hence prior to EM. Multiple segments that look similar are grouped together into clusters.

Sep 22 '17 15:09 veseshan

No cluster was grouped in this case as reported in the title. lcn.em is all NA. That's the issue we want to solve...

Sep 22 '17 15:09 fangxiaolan

Hyperfragmented samples are a bad starting point. Nothing can be done to get reasonable results from them.

Sep 22 '17 16:09 veseshan

Is the number of segments given by nrow(oo$jointseg)?

May 30 '18 01:05 andyjslee

No. That is the number of loci used in the analysis. The number of segments is nrow(oo$out) for the procSample output or nrow(oo$cncf) for the emcncf output.

May 30 '18 14:05 veseshan