universe parameter for backround genes
hello,
I am using enrichGO function from CluserProfiler R package. there is a parameter called universe, and we can use our backround genes by using this parameter. But based on my analysis, the results are exactly the same if I use backround genes or not. this is my code:
ENSGO1=enrichGO( datname1$entrezgene_id, OrgDb="org.Hs.eg.db", keyType = "ENTREZID",ont = "ALL",universe = ensIDall, pvalueCutoff = 0.05, pAdjustMethod = "BH", qvalueCutoff = 0.2, )
datname1$entrezgene is a list of genes in entrez id and ensIDall is my bacround genes in entrez id. why are the results are same? Is this a problem? I repeated this analysis with 3 different examples and their results are also the same with the backround genes addition. thank you.
Elif.
How is ensIDall defined? Does it include all human genes in org.Hs.eg.db?
that file includes all of my genes from my data. it is cancer data and has nearly 30,000 genes with entrez id.
Did you notice the explanation of the parameter universe? If you leave this parameter alone, clusterProfiler will use all genes with GO annotation in org.Hs.eg.db. So, there is a possibility that your background genes are actually no different from the ones in org.Hs.eg.db.
okey, I can not obtain the genes from org.Hs.eg.db, how can I reach them? also I tried this analysis with another set of genes. I have 2806 genes with entrez id. and I use 22410 number of gene as a backround. All these genes can be a part of the org.Hs.eg.db genes, but the significance of the p-values should be different because of the number of genes that are used. I tried this analysis with 6000 of the backround genes and results are again the same.
okey, I can not obtain the genes from org.Hs.eg.db, how can I reach them?
GO_DATA <- clusterProfiler:::get_GO_data("org.Hs.eg.db", "ALL", "ENTREZID")
extID <- DOSE:::ALLEXTID(GO_DATA )
The intersection of extID <- intersect(extID, universe) will be used as background.
18675 genes are common between my data and org.Hs.eg.db. and based on my analysis, if I give 6000 genes (that makes the common genes 3233) or 18675 genes, that does not change the results. but I think they should not be exactly the same, there should be a difference at least in p-values. thank you for your help, an If you want to perform same analysis I can share my data.
I noticed that the code logic for universe in DOSE::enricher_internal was changed 3 weeks ago, and I don't have time to check if that change is problematic at the moment. You might want to test it with older versions of DOSE and clusterProfiler.
Or try to update DOSE and clusterProfiler to latest version (may be developing version)?
okey, thank you for your help and efford. I download these packages 2 weeks ago, I think my version is the last version.
thank you for your help, an If you want to perform same analysis I can share my data.
Can you share your data? I want to repeat your issue in my environment which have older DOSE and clusterProfiler installed.
Which version of DOSE do you have installed?
hello, I repeated my analysis and error was occured about universe parameter. I converted my genes from ensemble id to entrez id to perform enrichment analysis with cluster profiler, this error did not occured last week, but today it said "universe parameter must be a character vector", so I converted my backround genes from integer to character. results are different now. I am sharing my data, you can try it. thank you for your help. without backround gene I have 8 terms, with backround genes I have 3 terms. data1.txt backround_genes.txt
I'm guessing you set options(enrichment_force_universe = TRUE) by chance last week so that the universe was ignored by DOSE:::enricher_internal? See here: https://github.com/YuLab-SMU/DOSE/blob/37e558a7067262ab732dca530377503359b3b87a/R/enricher_internal.R#L64
maybe, but as I remember, I dd not change any parameter. this error occured only once and I found the mistake. this package works only with the entrez id right? these ids are numbers so I need to convert them to a character. but before I add the universe parameter, cluster profiler worked with "integer".
@EEmanetci We expect gene IDs stored in 'character' mode.
If options(enrichment_force_universe = TRUE), then the universe is what you passed to this argument.
Otherwise, it is the intersect of the universe with all the genes that have annotations.