Error in infer.clonal.models: No clonal models for sample
Dear Ha X. Dang,
Hello, I am trying to analyze clonal evolution using PyClone and ClonEvol.
I have two WES samples from one patient.
When I followed the manual, I could not infer clonal models.
Here is my final input file for ClonEvol (it is stored in pyCloneResultMeltDcastDf below).
This is the original outcome from PyClone
KRCMC01270.PyClone.loci_results.txt
Below is the code for utilizing ClonEvol #########################################################################
library(data.table) library(clonevol) library(reshape2) library(tidyr)
pyCloneResult <- fread(/Absolute path/KRCMC01270.PyClone.loci_results.txt")
#To change the data frame structure - [mutation_id - sample_id - cluster_id - cellular_prevalence - cellular_prevalence_std - variant_allele_frequency] -> [mutation_id - cluster_id - sample1.vaf - sample2.vaf - sample1.cellular_prevalence - sample2.cellular_prevalence - sample1.cellular_prevalence_std - sample2.cellular_prevalence_std] #https://stackoverflow.com/questions/11608167/reshape-multiple-value-columns-to-wide-format
pyCloneResultMeltDf <- melt(pyCloneResultDf, id.vars=c("mutation_id", "cluster_id", "sample_id"))
pyCloneResultMeltDcastDf <- dcast(pyCloneResultMeltDf, cluster_id + mutation_id ~ sample_id + variable)
#We have to start cluster id from 1, thus adding +1 to each cluster id (based on the clonevol manual)
pyCloneResultMeltDcastDf$cluster_id <- pyCloneResultMeltDcastDf$cluster_id + 1
#To shorten vaf column names: "_variant_allele_frequency" -> "_vaf", "_cellular_prevalence" -> "_ccf", "---sampld-WBC" -> "" #https://stackoverflow.com/questions/28700987/data-table-setnames-combined-with-regex
setnames(pyCloneResultMeltDcastDf, names(pyCloneResultMeltDcastDf), gsub("_variant_allele_frequency", "_vaf", names(pyCloneResultMeltDcastDf)))
setnames(pyCloneResultMeltDcastDf, names(pyCloneResultMeltDcastDf), gsub("_cellular_prevalence", "_ccf", names(pyCloneResultMeltDcastDf)))
#To remove the normal information ([Tumor---Normal_vaf] -> [Tumor_vaf] setnames(pyCloneResultMeltDcastDf, names(pyCloneResultMeltDcastDf), gsub("---\S+-\S+", "", names(pyCloneResultMeltDcastDf)))
#To change the - (minus) into _ (underbar) setnames(pyCloneResultMeltDcastDf, names(pyCloneResultMeltDcastDf), gsub("-", "_", names(pyCloneResultMeltDcastDf)))
vaf.col.names <- grep('_vaf', colnames(pyCloneResultMeltDcastDf), value=T)
ccf.col.names <- grep('_ccf$', colnames(pyCloneResultMeltDcastDf), value=T)
sample.names <- gsub('_vaf', '', vaf.col.names)
#We utilize sample names as vaf columns (multiply 100 to utilize %)
pyCloneResultMeltDcastDf[, sample.names] <- pyCloneResultMeltDcastDf[, vaf.col.names] * 100
vaf.col.names <- sample.names
#We multiply 100 to ccf column (from proportion to percentage) pyCloneResultMeltDcastDf[, ccf.col.names] <- pyCloneResultMeltDcastDf[, ccf.col.names] * 100
# prepare sample grouping
#sample.groups <-sample.names
sample.groups <- c("C", "M")
names(sample.groups) <- sample.names
# setup the order of clusters to display in various plots (later)
pyCloneResultMeltDcastDf <- pyCloneResultMeltDcastDf[order(pyCloneResultMeltDcastDf$cluster_id),]
# setup the order of clusters to display in various plots (later)
pyCloneResultMeltDcastDf <- pyCloneResultMeltDcastDf[order(pyCloneResultMeltDcastDf$cluster_id),]
# To make a column which is corresponding to is.driver -> utilize CGC (cancer gene census genes) as a driver gene
Load CGC genes
cgc.file <- file.path("/BiO/Share/Database/COSMIC/grch37/v90/cancer_gene_census.csv") cgc.df = read.csv(cgc.file, as.is = T) cgc.genes = unique(cgc.df$Gene.Symbol)
pyCloneResultMeltDcastDf$CGC <- sapply(strsplit(pyCloneResultMeltDcastDf$mutation_id, "_"), function(x) x[1]) %in% cgc.genes
#Choosing colors for the clones
clone.colors <- NULL
#Visualizing the variant clusters outputFile <- gsub(pattern="loci_results.txt", replacement="loci_results_jitter.pdf", x = pyCloneResult)
pdf(outputFile, width = 3, height = 3, useDingbats = FALSE, title='')
pp <- plot.variant.clusters(pyCloneResultMeltDcastDf,
cluster.col.name = 'cluster',
show.cluster.size = FALSE,
cluster.size.text.color = 'blue',
vaf.col.names = vaf.col.names,
vaf.limits = 70,
sample.title.size = 10,
violin = FALSE,
box = FALSE,
jitter = TRUE,
jitter.shape = 1,
jitter.color = clone.colors,
jitter.size = 2,
jitter.alpha = 1,
jitter.center.method = 'median',
jitter.center.size = 1,
jitter.center.color = 'darkgray',
jitter.center.display.value = 'none',
highlight = 'is.driver',
highlight.shape = 21,
highlight.color = 'blue',
highlight.fill.color = 'green',
highlight.note.col.name = 'mutatin_id',
highlight.note.size = 2,
order.by.total.vaf = FALSE)
dev.off()
#>> Here is the result KRCMC01270.PyClone.loci_results_jitter.pdf
#Plotting mean/median of clusters across samples (cluster flow)
plot.cluster.flow(pyCloneResultMeltDcastDf, vaf.col.names = vaf.col.names,
sample.names = sample.names,
colors = clone.colors)
Here is the result.

######################################################################## #Inferring clonal evolution trees y = infer.clonal.models(variants = pyCloneResultMeltDcastDf, cluster.col.name = 'cluster', #vaf.col.names = vaf.col.names, ccf.col.names = ccf.col.names, sample.groups = sample.groups, cancer.initiation.model='monoclonal', subclonal.test = 'bootstrap', subclonal.test.model = 'non-parametric', num.boots = 1000, founding.cluster = 1, cluster.center = 'mean', ignore.clusters = NULL, clone.colors = clone.colors, min.cluster.vaf = 0.01, # min probability that CCF(clone) is non-negative sum.p = 0.05, # alpha level in confidence interval estimate for CCF(clone) alpha = 0.05)
######################################################################## ###Following is the error messages
Calculate VAF as CCF/2 Sample 1: KRCMC01270_T1_D_ccf <-- KRCMC01270_T1_D_ccf Sample 2: KRCMC01270_T2_D_ccf <-- KRCMC01270_T2_D_ccf Using monoclonal model Note: all VAFs were divided by 100 to convert from percentage to proportion. Generating non-parametric boostrap samples... KRCMC01270_T1_D_ccf : Enumerating clonal architectures... Determining if cluster VAF is significantly positive... Exluding clusters whose VAF < min.cluster.vaf=0.01 Non-positive VAF clusters: KRCMC01270_T1_D_ccf : 0 clonal architecture model(s) found
lab vaf color parent ancestors occupied free free.mean 4 4 0.4168754 #cab2d6 NA - 0 0.4168754 NA 5 5 0.3003359 #ff99ff NA - 0 0.3003359 NA 3 3 0.2887949 #b2df8a NA - 0 0.2887949 NA 9 9 0.2780810 #cf8d30 NA - 0 0.2780810 NA 6 6 0.2759430 #fdbf6f NA - 0 0.2759430 NA 2 2 0.2343575 #a6cee3 NA - 0 0.2343575 NA 8 8 0.2068802 #bbbb77 NA - 0 0.2068802 NA 7 7 0.1714719 #fb9a99 NA - 0 0.1714719 NA 1 1 0.1211232 #cccccc NA - 0 0.1211232 NA free.lower free.upper free.confident.level free.confident.level.non.negative 4 NA NA NA NA 5 NA NA NA NA 3 NA NA NA NA 9 NA NA NA NA 6 NA NA NA NA 2 NA NA NA NA 8 NA NA NA NA 7 NA NA NA NA 1 NA NA NA NA p.value num.subclones excluded 4 NA 0 FALSE 5 NA 0 FALSE 3 NA 0 FALSE 9 NA 0 FALSE 6 NA 0 FALSE 2 NA 0 FALSE 8 NA 0 FALSE 7 NA 0 FALSE 1 NA 0 FALSE ERROR: No clonal models for sample: KRCMC01270_T1_D_ccf Check data or remove this sample, then re-run.
Also check if founding.cluster was set correctly!
Could you give me any idea how to solve this problem?
I think PyClone result is not very good because most variants are in cluster 1

Thank you in advance for your time
Sincreley,
Seung-hoon
I have similar issue: input is from pyclone vi with WGS data. The cluster table 1 2 3 4 5 10 1805 203 116 1471

#The code I run
mutli_full_infer = infer.clonal.models(variants = multi_full, cluster.col.name = 'cluster',ccf.col.names = paste(c('A','B'),'ccf',sep=''), sample.groups = sample_groups,cancer.initiation.model='monoclonal', subclonal.test = 'bootstrap', subclonal.test.model = 'non-parametric',num.boots = 1000, founding.cluster = 1, cluster.center = 'mean', ignore.clusters = NULL, clone.colors = clone.colors, min.cluster.vaf = 0.01, sum.p = 0.05, alpha = 0.05)
#error message
Calculate VAF as CCF/2
Sample 1: Accf <-- Accf
Sample 2: Bccf <-- Bccf
Using monoclonal model
Note: all VAFs were divided by 100 to convert from percentage to proportion.
Generating non-parametric boostrap samples...
Accf : Enumerating clonal architectures...
Determining if cluster VAF is significantly positive...
Exluding clusters whose VAF < min.cluster.vaf=0.01
Non-positive VAF clusters:
Accf : 0 clonal architecture model(s) found
lab vaf color parent ancestors occupied free free.mean free.lower 4 4 0.42025 #cab2d6 NA - 0 0.42025 NA NA 5 5 0.27755 #ff99ff NA - 0 0.27755 NA NA 2 2 0.16680 #a6cee3 NA - 0 0.16680 NA NA 3 3 0.09810 #b2df8a NA - 0 0.09810 NA NA 1 1 0.03360 #cccccc NA - 0 0.03360 NA NA free.upper free.confident.level free.confident.level.non.negative p.value 4 NA NA NA NA 5 NA NA NA NA 2 NA NA NA NA 3 NA NA NA NA 1 NA NA NA NA num.subclones excluded 4 0 FALSE 5 0 FALSE 2 0 FALSE 3 0 FALSE 1 0 FALSE ERROR: No clonal models for sample: Accf Check data or remove this sample, then re-run.
Also check if founding.cluster was set correctly!
@hoonghim ,Hello hoonghim~, I met the same problems, how could you solved? Hope your help,it will most helpful for me!
@hoonghim ,Hello hoonghim~, I met the same problems, how could you solved? Hope your help,it will most helpful for me!
Hi, edceeyuchen
Unfortunately, I couldn't solve the issue. And the author didn't reply to my question (maybe he is busy...).
It's been about 4 years since I couldn't solve this problem.
I think it would be helpful to find papers that use ClonEval and provide their custom script in their code availability section.
Sorry for not being helpful.
Seunghoon
hello, try to use corrected VAF or CCF. See: https://github.com/hdng/clonevol/issues/21
OMG. I found something. @edceeyuchen
I got the same error, But I changed the options "monoclonal" to "polyclonal". It worked well!!