DEP icon indicating copy to clipboard operation
DEP copied to clipboard

Error in .ebayes(fit = fit, proportion = proportion, stdev.coef.lim = stdev.coef.lim, : No finite residual standard deviations

Open mbihie opened this issue 2 years ago • 2 comments

Hello,

I am trying to run the DEP package on a MaxQuant output for a paper (x) that used 21 different treatments on all 5 patient samples . I ran MaxQuant as if each sample was its own experiment and generated a proteinGroups.txt file lfq-proteinGroups.txt.

This was the code I used to create the design object. I wasn't sure of how to fill out the condition and replicate since I want to compare all of them against each other equally.

#create experiment design object 
label <- c("V1", "V2", "V3", "V4", "V5")
condition <- c("1","2","3","4","5")
replicate <- c("1","2","3","4","5")
experimental_design <- data.frame(label, condition, replicate)

When going through the DEP package example (x), I ran into an issue when testing all possible comparisons of samples in the differential enrichment analysis portion of the tutorial.

type = "all"

# Test all possible comparisons of samples
data_diff_all_contrasts <- test_diff(data_imp, type = "all", control = NULL)

Error in .ebayes(fit = fit, proportion = proportion, stdev.coef.lim = stdev.coef.lim, :
No finite residual standard deviations

type = "manual"

# Test manually defined comparisons
data_diff_manual <- test_diff(data_norm, type = "manual", 
                              test = c("X1", "X2"))

Error in .ebayes(fit = fit, proportion = proportion, stdev.coef.lim = stdev.coef.lim, :
No finite residual standard deviations

I am unsure of what mistake I made and if it has to do with my use of the DEP package or the MaxQuant output. Please let me know where I went wrong.

complet code

# Loading a package required for data handling
library("dplyr")

# Read raw file
raw = read.delim("~/proteomics-practice/data/lfq-proteinGroups.txt", stringsAsFactors = FALSE, colClasses = "character")

# The data is provided with the package
data <- raw

# We filter for contaminant proteins and decoy database hits, which are indicated by "+" in the columns "Potential.contaminants" and "Reverse", respectively. 
data <- filter(data, Reverse != "+", Potential.contaminant != "+")

# Are there any duplicated gene names?
data$Gene.names %>% duplicated() %>% any()

# Make a table of duplicated gene names
data %>% group_by(Gene.names) %>% summarize(frequency = n()) %>% 
  arrange(desc(frequency)) %>% filter(frequency > 1)

#make_unique(): generates unique identifiers for a proteomics dataset based on "name" and "id" columns.
data_unique <- make_unique(data, "Gene.names", "Protein.IDs", delim = ";")

# Are there any duplicated names?
data$name %>% duplicated() %>% any()

#get LFQ column numbers
LFQ_columns <- grep("LFQ.", colnames(data_unique)) 

#make LFQ values numeric
data_unique$LFQ.intensity.V1 <- as.numeric(data_unique$LFQ.intensity.V1)
data_unique$LFQ.intensity.V2 <- as.numeric(data_unique$LFQ.intensity.V2)
data_unique$LFQ.intensity.V3 <- as.numeric(data_unique$LFQ.intensity.V3)
data_unique$LFQ.intensity.V4 <- as.numeric(data_unique$LFQ.intensity.V4)
data_unique$LFQ.intensity.V5 <- as.numeric(data_unique$LFQ.intensity.V5)

#create experiment design object 
label <- c("V1", "V2", "V3", "V4", "V5")
condition <- c("1","2","3","4","5")
replicate <- c("1","2","3","4","5")
experimental_design <- data.frame(label, condition, replicate)

#Generate a SummarizedExperiment object using an experimental design
data_se <- make_se(data_unique, LFQ_columns, experimental_design)

# Generate a SummarizedExperiment object by parsing condition information from the column names
LFQ_columns <- grep("LFQ.", colnames(data_unique)) # get LFQ column numbers
data_se_parsed <- make_se_parse(data_unique, LFQ_columns)

# Let's have a look at the SummarizedExperiment object
data_se

# Plot a barplot of the protein identification overlap between samples
plot_frequency(data_se)

# Filter for proteins that are identified in all replicates of at least one condition
data_filt <- filter_missval(data_se, thr = 0)

# Less stringent filtering:
# Filter for proteins that are identified in 2 out of 3 replicates of at least one condition
data_filt2 <- filter_missval(data_se, thr = 1)

# Plot a barplot of the number of identified proteins per samples
plot_numbers(data_filt2)

# Plot a barplot of the protein identification overlap between samples
plot_coverage(data_filt)

# Normalize the data
data_norm <- normalize_vsn(data_filt)

# Visualize normalization by boxplots for all samples before and after normalization
plot_normalization(data_filt, data_norm)

# Impute missing data using random draws from a Gaussian distribution centered around a minimal value (for MNAR)
data_imp <- impute(data_norm, fun = "MinProb", q = 0.01)

# Impute missing data using random draws from a manually defined left-shifted Gaussian distribution (for MNAR)
data_imp_man <- impute(data_norm, fun = "man", shift = 1.8, scale = 0.3)

# Impute missing data using the k-nearest neighbour approach (for MAR)
data_imp_knn <- impute(data_norm, fun = "knn", rowmax = 0.9)

# Plot intensity distributions before and after imputation
plot_imputation(data_norm, data_imp)

# Test all possible comparisons of samples
data_diff_all_contrasts <- test_diff(data_imp, type = "all", control = NULL)

mbihie avatar Dec 10 '23 22:12 mbihie

I have the same issue. Did you reach out any solution?

abdelrhmannabil avatar Jan 18 '25 18:01 abdelrhmannabil

I think it is because you have no replicates in your 'condition' variable. It's not possible to calculate standard deviations if you only have one sample per condition.

ecologysarah avatar Jun 10 '25 08:06 ecologysarah