clonevol icon indicating copy to clipboard operation
clonevol copied to clipboard

Error in infer.clonal.models: No clonal models for sample

Open hoonghim opened this issue 6 years ago • 5 comments

Dear Ha X. Dang,

Hello, I am trying to analyze clonal evolution using PyClone and ClonEvol.

I have two WES samples from one patient.

When I followed the manual, I could not infer clonal models.

Here is my final input file for ClonEvol (it is stored in pyCloneResultMeltDcastDf below).

clonevol_input.txt

This is the original outcome from PyClone

KRCMC01270.PyClone.loci_results.txt

Below is the code for utilizing ClonEvol #########################################################################

library(data.table) library(clonevol) library(reshape2) library(tidyr)

pyCloneResult <- fread(/Absolute path/KRCMC01270.PyClone.loci_results.txt")

#To change the data frame structure - [mutation_id - sample_id - cluster_id - cellular_prevalence - cellular_prevalence_std - variant_allele_frequency] -> [mutation_id - cluster_id - sample1.vaf - sample2.vaf - sample1.cellular_prevalence - sample2.cellular_prevalence - sample1.cellular_prevalence_std - sample2.cellular_prevalence_std] #https://stackoverflow.com/questions/11608167/reshape-multiple-value-columns-to-wide-format

pyCloneResultMeltDf <- melt(pyCloneResultDf, id.vars=c("mutation_id", "cluster_id", "sample_id"))

pyCloneResultMeltDcastDf <- dcast(pyCloneResultMeltDf, cluster_id + mutation_id ~ sample_id + variable)

#We have to start cluster id from 1, thus adding +1 to each cluster id (based on the clonevol manual)

    pyCloneResultMeltDcastDf$cluster_id <- pyCloneResultMeltDcastDf$cluster_id + 1

#To shorten vaf column names: "_variant_allele_frequency" -> "_vaf", "_cellular_prevalence" -> "_ccf", "---sampld-WBC" -> "" #https://stackoverflow.com/questions/28700987/data-table-setnames-combined-with-regex

    setnames(pyCloneResultMeltDcastDf, names(pyCloneResultMeltDcastDf), gsub("_variant_allele_frequency", "_vaf", names(pyCloneResultMeltDcastDf)))
    setnames(pyCloneResultMeltDcastDf, names(pyCloneResultMeltDcastDf), gsub("_cellular_prevalence", "_ccf", names(pyCloneResultMeltDcastDf)))

#To remove the normal information ([Tumor---Normal_vaf] -> [Tumor_vaf] setnames(pyCloneResultMeltDcastDf, names(pyCloneResultMeltDcastDf), gsub("---\S+-\S+", "", names(pyCloneResultMeltDcastDf)))

#To change the - (minus) into _ (underbar) setnames(pyCloneResultMeltDcastDf, names(pyCloneResultMeltDcastDf), gsub("-", "_", names(pyCloneResultMeltDcastDf)))

    vaf.col.names <- grep('_vaf', colnames(pyCloneResultMeltDcastDf), value=T)
    ccf.col.names <- grep('_ccf$', colnames(pyCloneResultMeltDcastDf), value=T)
    sample.names <- gsub('_vaf', '', vaf.col.names)

#We utilize sample names as vaf columns (multiply 100 to utilize %)

    pyCloneResultMeltDcastDf[, sample.names] <- pyCloneResultMeltDcastDf[, vaf.col.names] * 100
    vaf.col.names <- sample.names

#We multiply 100 to ccf column (from proportion to percentage) pyCloneResultMeltDcastDf[, ccf.col.names] <- pyCloneResultMeltDcastDf[, ccf.col.names] * 100

    # prepare sample grouping
    #sample.groups <-sample.names
    sample.groups <- c("C", "M")
    names(sample.groups) <- sample.names

    # setup the order of clusters to display in various plots (later)
    pyCloneResultMeltDcastDf <- pyCloneResultMeltDcastDf[order(pyCloneResultMeltDcastDf$cluster_id),]

    # setup the order of clusters to display in various plots (later)
    pyCloneResultMeltDcastDf <- pyCloneResultMeltDcastDf[order(pyCloneResultMeltDcastDf$cluster_id),]

   # To make a column which is corresponding to is.driver -> utilize CGC (cancer gene census genes) as a driver gene
Load CGC genes

cgc.file <- file.path("/BiO/Share/Database/COSMIC/grch37/v90/cancer_gene_census.csv") cgc.df = read.csv(cgc.file, as.is = T) cgc.genes = unique(cgc.df$Gene.Symbol)

    pyCloneResultMeltDcastDf$CGC <- sapply(strsplit(pyCloneResultMeltDcastDf$mutation_id, "_"), function(x) x[1]) %in% cgc.genes

    #Choosing colors for the clones
    clone.colors <- NULL

#Visualizing the variant clusters outputFile <- gsub(pattern="loci_results.txt", replacement="loci_results_jitter.pdf", x = pyCloneResult)

    pdf(outputFile, width = 3, height = 3, useDingbats = FALSE, title='')
    pp <- plot.variant.clusters(pyCloneResultMeltDcastDf,
                                cluster.col.name = 'cluster',
                                show.cluster.size = FALSE,
                                cluster.size.text.color = 'blue',
                                vaf.col.names = vaf.col.names,
                                vaf.limits = 70,
                                sample.title.size = 10,
                                violin = FALSE,
                                box = FALSE,
                                jitter = TRUE,
                                jitter.shape = 1,
                                jitter.color = clone.colors,
                                jitter.size = 2,
                                jitter.alpha = 1,
                                jitter.center.method = 'median',
                                jitter.center.size = 1,
                                jitter.center.color = 'darkgray',
                                jitter.center.display.value = 'none',
                                highlight = 'is.driver',
                                highlight.shape = 21,
                                highlight.color = 'blue',
                                highlight.fill.color = 'green',
                                highlight.note.col.name = 'mutatin_id',
                                highlight.note.size = 2,
                                order.by.total.vaf = FALSE)
    dev.off()

#>> Here is the result KRCMC01270.PyClone.loci_results_jitter.pdf

    #Plotting mean/median of clusters across samples (cluster flow)
    plot.cluster.flow(pyCloneResultMeltDcastDf, vaf.col.names = vaf.col.names,
                      sample.names = sample.names,
                      colors = clone.colors)

Here is the result. image

######################################################################## #Inferring clonal evolution trees y = infer.clonal.models(variants = pyCloneResultMeltDcastDf, cluster.col.name = 'cluster', #vaf.col.names = vaf.col.names, ccf.col.names = ccf.col.names, sample.groups = sample.groups, cancer.initiation.model='monoclonal', subclonal.test = 'bootstrap', subclonal.test.model = 'non-parametric', num.boots = 1000, founding.cluster = 1, cluster.center = 'mean', ignore.clusters = NULL, clone.colors = clone.colors, min.cluster.vaf = 0.01, # min probability that CCF(clone) is non-negative sum.p = 0.05, # alpha level in confidence interval estimate for CCF(clone) alpha = 0.05)

######################################################################## ###Following is the error messages

Calculate VAF as CCF/2 Sample 1: KRCMC01270_T1_D_ccf <-- KRCMC01270_T1_D_ccf Sample 2: KRCMC01270_T2_D_ccf <-- KRCMC01270_T2_D_ccf Using monoclonal model Note: all VAFs were divided by 100 to convert from percentage to proportion. Generating non-parametric boostrap samples... KRCMC01270_T1_D_ccf : Enumerating clonal architectures... Determining if cluster VAF is significantly positive... Exluding clusters whose VAF < min.cluster.vaf=0.01 Non-positive VAF clusters: KRCMC01270_T1_D_ccf : 0 clonal architecture model(s) found

lab vaf color parent ancestors occupied free free.mean 4 4 0.4168754 #cab2d6 NA - 0 0.4168754 NA 5 5 0.3003359 #ff99ff NA - 0 0.3003359 NA 3 3 0.2887949 #b2df8a NA - 0 0.2887949 NA 9 9 0.2780810 #cf8d30 NA - 0 0.2780810 NA 6 6 0.2759430 #fdbf6f NA - 0 0.2759430 NA 2 2 0.2343575 #a6cee3 NA - 0 0.2343575 NA 8 8 0.2068802 #bbbb77 NA - 0 0.2068802 NA 7 7 0.1714719 #fb9a99 NA - 0 0.1714719 NA 1 1 0.1211232 #cccccc NA - 0 0.1211232 NA free.lower free.upper free.confident.level free.confident.level.non.negative 4 NA NA NA NA 5 NA NA NA NA 3 NA NA NA NA 9 NA NA NA NA 6 NA NA NA NA 2 NA NA NA NA 8 NA NA NA NA 7 NA NA NA NA 1 NA NA NA NA p.value num.subclones excluded 4 NA 0 FALSE 5 NA 0 FALSE 3 NA 0 FALSE 9 NA 0 FALSE 6 NA 0 FALSE 2 NA 0 FALSE 8 NA 0 FALSE 7 NA 0 FALSE 1 NA 0 FALSE ERROR: No clonal models for sample: KRCMC01270_T1_D_ccf Check data or remove this sample, then re-run.

Also check if founding.cluster was set correctly!

Could you give me any idea how to solve this problem?

I think PyClone result is not very good because most variants are in cluster 1

image

Thank you in advance for your time

Sincreley,

Seung-hoon

hoonghim avatar Oct 04 '19 04:10 hoonghim

I have similar issue: input is from pyclone vi with WGS data. The cluster table 1 2 3 4 5 10 1805 203 116 1471

image

#The code I run

mutli_full_infer = infer.clonal.models(variants = multi_full, cluster.col.name = 'cluster',ccf.col.names = paste(c('A','B'),'ccf',sep=''), sample.groups = sample_groups,cancer.initiation.model='monoclonal', subclonal.test = 'bootstrap', subclonal.test.model = 'non-parametric',num.boots = 1000, founding.cluster = 1, cluster.center = 'mean', ignore.clusters = NULL, clone.colors = clone.colors, min.cluster.vaf = 0.01, sum.p = 0.05, alpha = 0.05)

#error message Calculate VAF as CCF/2 Sample 1: Accf <-- Accf Sample 2: Bccf <-- Bccf Using monoclonal model Note: all VAFs were divided by 100 to convert from percentage to proportion. Generating non-parametric boostrap samples... Accf : Enumerating clonal architectures... Determining if cluster VAF is significantly positive... Exluding clusters whose VAF < min.cluster.vaf=0.01 Non-positive VAF clusters:
Accf : 0 clonal architecture model(s) found

lab vaf color parent ancestors occupied free free.mean free.lower 4 4 0.42025 #cab2d6 NA - 0 0.42025 NA NA 5 5 0.27755 #ff99ff NA - 0 0.27755 NA NA 2 2 0.16680 #a6cee3 NA - 0 0.16680 NA NA 3 3 0.09810 #b2df8a NA - 0 0.09810 NA NA 1 1 0.03360 #cccccc NA - 0 0.03360 NA NA free.upper free.confident.level free.confident.level.non.negative p.value 4 NA NA NA NA 5 NA NA NA NA 2 NA NA NA NA 3 NA NA NA NA 1 NA NA NA NA num.subclones excluded 4 0 FALSE 5 0 FALSE 2 0 FALSE 3 0 FALSE 1 0 FALSE ERROR: No clonal models for sample: Accf Check data or remove this sample, then re-run.

Also check if founding.cluster was set correctly!

xmzhuo avatar Mar 22 '21 22:03 xmzhuo

@hoonghim ,Hello hoonghim~, I met the same problems, how could you solved? Hope your help,it will most helpful for me!

edceeyuchen avatar Jun 14 '23 09:06 edceeyuchen

@hoonghim ,Hello hoonghim~, I met the same problems, how could you solved? Hope your help,it will most helpful for me!

Hi, edceeyuchen

Unfortunately, I couldn't solve the issue. And the author didn't reply to my question (maybe he is busy...).

It's been about 4 years since I couldn't solve this problem.

I think it would be helpful to find papers that use ClonEval and provide their custom script in their code availability section.

Sorry for not being helpful.

Seunghoon

seunghoonv avatar Jun 18 '23 09:06 seunghoonv

hello, try to use corrected VAF or CCF. See: https://github.com/hdng/clonevol/issues/21

snowvov avatar Dec 18 '23 06:12 snowvov

OMG. I found something. @edceeyuchen

I got the same error, But I changed the options "monoclonal" to "polyclonal". It worked well!!

oghzzang avatar Jan 09 '24 02:01 oghzzang