diff_analysis for picrust2 predicted pathway abundance table?
Hi, I just trying to apply diff_analysis function to the phyloseq object built with picrust2-predicted pathway abundance table.
The otu-table is like this:
| pathway | sampleD1-10F | sampleD1-10 | sampleD1-11F | sampleD1-11 | sampleD1-12F |
|---|---|---|---|---|---|
| pathway1 | 11600.31 | 10598.87 | 13448.91 | 13520.78 | 12997.12 |
| pathway2 | 0 | 0 | 0 | 0 | 0 |
| pathway3 | 23.7838 | 70.1857 | 161.0672 | 85.7279 | 62.5566 |
| pathway4 | 10073.85 | 8500.755 | 14023.53 | 9969.657 | 11871.62 |
| pathway5 | 14118.24 | 13545.21 | 17733.63 | 17366.78 | 15573.78 |
| pathway6 | 4106.691 | 3976.353 | 5092.591 | 4838.09 | 4042.221 |
The tax_table is like this:
| pathway | Rank1 |
|---|---|
| pathway1 | N10-formyl-tetrahydrofolate_biosynthesis |
| pathway2 | 4-hydroxyphenylacetate_degradation |
| pathway3 | superpathway_of_chorismate_metabolism |
| pathway4 | homolactic_fermentation |
| pathway5 | glycolysis_III_(from_glucose) |
| pathway6 | superpathway_of_arginine_and_polyamine_biosynthesis |
There was an error happened when running the diff_analysis analysis:
Error in .rowNamesDF<-(x, value = value) : invalid 'row.names' length
The code is here:
metadata <- read.csv("D:/16S_sequencing_data/YumingWang/metadata.txt",sep = "\t")
metadata <- subset(metadata, Group %in% c("High","Low"))
rownames(metadata) <- metadata$sample.id
test_cyc_path <- read.table("D:/16S_sequencing_data/YumingWang/HL/picrust2_out/pathways_out/path_abun_unstrat1.tsv", header=TRUE, sep="\t", row.names=1)
pathway_des <- openxlsx::read.xlsx("D:/16S_sequencing_data/YumingWang/HL/picrust2_out/pathways_out/pathway_description.xlsx", colNames=T, rowNames=T)
colnames(test_cyc_path) <- gsub("[.]", "-", colnames(test_cyc_path))
ps_cyc_path <- phyloseq(sample_data(metadata),
tax_table(as.matrix(pathway_des)),
otu_table(as.matrix(test_cyc_path),taxa_are_rows=T))
ps_xx <- microbiome::transform(ps_xx, "compositional")
ps_xx1 <- subset_taxa(subset_samples(ps_xx,Study=="1st" & Location=="stool"&Day=="28"))
head(data.frame(sample_data(ps_xx1)))
sample.id Location Group Day Swine_ID Study X10_16S_Copies_mg
sampleD28-10F sampleD28-10F stool High 28 10 1st 1.520 sampleD28-11F sampleD28-11F stool High 28 11 1st 4.240 sampleD28-12F sampleD28-12F stool High 28 12 1st 9.600 sampleD28-13F sampleD28-13F stool Low 28 13 1st 0.977 sampleD28-14F sampleD28-14F stool Low 28 14 1st 5.140 sampleD28-15F sampleD28-15F stool Low 28 15 1st 8.860
diffres <- diff_analysis(ps_xx1,
classgroup="Group",
mlfun="lda",
filtermod="fdr",
firstcomfun = "kruskal.test",
firstalpha=0.05,
strictmod=TRUE,
secondcomfun = "wilcox.test",
subclmin=3,
subclwilc=TRUE,
secondalpha=0.05,
lda=3)
Error in .rowNamesDF<-(x, value = value) : invalid 'row.names' length
Is there any method to fix this error? Many thanks!
The old version don't support the function abundance table, but the newest version (github) has supported it. Note, if you don't want to use the relative abundance, and you want to use the table you has processed previously. You should set standard_method argument to count , which mean that original input dataset (you has processed previously,eg: this is efficient to use the count data after rarefing) will be used. Of course, you also can use the method of decostand in vegan to standardize.
Yes, you were right! I used the relative abundance pathway data (ps_xx <- microbiome::transform(ps_xx, "compositional")) to perform the diff_analysis. Also, I've installed the newest version (github). However, the error was still there. It seems to be that something related with the row.names is wrong but I don't know how to fix it. Could you provide a demo function abundance table and tax_table that could perform the analysis successfully? Thank you for your help! Hongbin
You can try to add type="others" in diff_analysis. I could not get more detail information,or would you please send the phyloseq class to me by email.
Thanks for your help! I've save the ps object and sent it to your email [email protected] !
Ok,I viewed table of the pathway annotation, it only have one rank, don't contain hierarchical relationship. In the condition, the input is phyloseq class, it was not be supported. There are two methods to solve the problem.
- first, you can complete the pathway annotation, because the kegg annotation also has hierarchical relationship.
- second, you can convert
phyloseqclass to data.frame. sincediff_analysisalso supportdata.frameas input.
library(MicrobiotaProcess)
tax <- data.frame(ps_xx1@tax_table, check.names=FALSE)
sampleda <- data.frame(ps_xx1@sam_data, check.names=FALSE)
otuda <- data.frame(ps_xx1@otu_table, check.names=FALSE)
featuretab <- merge(otuda, tax, by=0)
rownames(featuretab) <- as.vector(featuretab$Rank1)
featuretab$Row.names <- NULL
featuretab$Rank1 <- NULL
featuretab <- data.frame(t(featuretab)*100, check.names=FALSE)
diffres <- diff_analysis(obj=featuretab,
sampleda=sampleda,
classgroup="Group",
#alltax=FALSE,
mlfun="lda",
filtermod="pvalue",
standard_method=NULL,
firstcomfun = "kruskal.test",
firstalpha=0.05,
strictmod=TRUE,
secondcomfun = "wilcox.test",
subclmin=3,
type="others",
subclwilc=TRUE,
secondalpha=0.01,
lda=3)
diffres
the taxda was not be provided, so the result is not be visualized by ggdiffcalde. But you can use ggdiffbox to visualize it.
p <- ggdiffbox(
diffres,
l_xlabtext="relative abundance (%)",
box_notch=FALSE,
colorlist=c("#00AED7", "#009E73")
)
p

you also can use ggdifftaxbar to visualize each discriminative feature by barplot.
ggdifftaxbar(
obj=diffres,
xtextsize=1.5,
output="function_biomarkder_barplot",
coloslist=c("#00AED7", "#009E73")
)
Thanks so much for your help! This method solves my issue perfectly! Cheers!