Error in .ebayes(fit = fit, proportion = proportion, stdev.coef.lim = stdev.coef.lim, : No finite residual standard deviations
Hello,
I am trying to run the DEP package on a MaxQuant output for a paper (x) that used 21 different treatments on all 5 patient samples . I ran MaxQuant as if each sample was its own experiment and generated a proteinGroups.txt file lfq-proteinGroups.txt.
This was the code I used to create the design object. I wasn't sure of how to fill out the condition and replicate since I want to compare all of them against each other equally.
#create experiment design object
label <- c("V1", "V2", "V3", "V4", "V5")
condition <- c("1","2","3","4","5")
replicate <- c("1","2","3","4","5")
experimental_design <- data.frame(label, condition, replicate)
When going through the DEP package example (x), I ran into an issue when testing all possible comparisons of samples in the differential enrichment analysis portion of the tutorial.
type = "all"
# Test all possible comparisons of samples
data_diff_all_contrasts <- test_diff(data_imp, type = "all", control = NULL)
Error in .ebayes(fit = fit, proportion = proportion, stdev.coef.lim = stdev.coef.lim, :
No finite residual standard deviations
type = "manual"
# Test manually defined comparisons
data_diff_manual <- test_diff(data_norm, type = "manual",
test = c("X1", "X2"))
Error in .ebayes(fit = fit, proportion = proportion, stdev.coef.lim = stdev.coef.lim, :
No finite residual standard deviations
I am unsure of what mistake I made and if it has to do with my use of the DEP package or the MaxQuant output. Please let me know where I went wrong.
complet code
# Loading a package required for data handling
library("dplyr")
# Read raw file
raw = read.delim("~/proteomics-practice/data/lfq-proteinGroups.txt", stringsAsFactors = FALSE, colClasses = "character")
# The data is provided with the package
data <- raw
# We filter for contaminant proteins and decoy database hits, which are indicated by "+" in the columns "Potential.contaminants" and "Reverse", respectively.
data <- filter(data, Reverse != "+", Potential.contaminant != "+")
# Are there any duplicated gene names?
data$Gene.names %>% duplicated() %>% any()
# Make a table of duplicated gene names
data %>% group_by(Gene.names) %>% summarize(frequency = n()) %>%
arrange(desc(frequency)) %>% filter(frequency > 1)
#make_unique(): generates unique identifiers for a proteomics dataset based on "name" and "id" columns.
data_unique <- make_unique(data, "Gene.names", "Protein.IDs", delim = ";")
# Are there any duplicated names?
data$name %>% duplicated() %>% any()
#get LFQ column numbers
LFQ_columns <- grep("LFQ.", colnames(data_unique))
#make LFQ values numeric
data_unique$LFQ.intensity.V1 <- as.numeric(data_unique$LFQ.intensity.V1)
data_unique$LFQ.intensity.V2 <- as.numeric(data_unique$LFQ.intensity.V2)
data_unique$LFQ.intensity.V3 <- as.numeric(data_unique$LFQ.intensity.V3)
data_unique$LFQ.intensity.V4 <- as.numeric(data_unique$LFQ.intensity.V4)
data_unique$LFQ.intensity.V5 <- as.numeric(data_unique$LFQ.intensity.V5)
#create experiment design object
label <- c("V1", "V2", "V3", "V4", "V5")
condition <- c("1","2","3","4","5")
replicate <- c("1","2","3","4","5")
experimental_design <- data.frame(label, condition, replicate)
#Generate a SummarizedExperiment object using an experimental design
data_se <- make_se(data_unique, LFQ_columns, experimental_design)
# Generate a SummarizedExperiment object by parsing condition information from the column names
LFQ_columns <- grep("LFQ.", colnames(data_unique)) # get LFQ column numbers
data_se_parsed <- make_se_parse(data_unique, LFQ_columns)
# Let's have a look at the SummarizedExperiment object
data_se
# Plot a barplot of the protein identification overlap between samples
plot_frequency(data_se)
# Filter for proteins that are identified in all replicates of at least one condition
data_filt <- filter_missval(data_se, thr = 0)
# Less stringent filtering:
# Filter for proteins that are identified in 2 out of 3 replicates of at least one condition
data_filt2 <- filter_missval(data_se, thr = 1)
# Plot a barplot of the number of identified proteins per samples
plot_numbers(data_filt2)
# Plot a barplot of the protein identification overlap between samples
plot_coverage(data_filt)
# Normalize the data
data_norm <- normalize_vsn(data_filt)
# Visualize normalization by boxplots for all samples before and after normalization
plot_normalization(data_filt, data_norm)
# Impute missing data using random draws from a Gaussian distribution centered around a minimal value (for MNAR)
data_imp <- impute(data_norm, fun = "MinProb", q = 0.01)
# Impute missing data using random draws from a manually defined left-shifted Gaussian distribution (for MNAR)
data_imp_man <- impute(data_norm, fun = "man", shift = 1.8, scale = 0.3)
# Impute missing data using the k-nearest neighbour approach (for MAR)
data_imp_knn <- impute(data_norm, fun = "knn", rowmax = 0.9)
# Plot intensity distributions before and after imputation
plot_imputation(data_norm, data_imp)
# Test all possible comparisons of samples
data_diff_all_contrasts <- test_diff(data_imp, type = "all", control = NULL)
I have the same issue. Did you reach out any solution?
I think it is because you have no replicates in your 'condition' variable. It's not possible to calculate standard deviations if you only have one sample per condition.