gson 格式在最新版 enrichKEGG中出错
我已经下载并安装clusterProfiler,DOSE,HDO.db的github版本
在跑enrichKEGG时候发现报错
> kk <- gson_KEGG('mmu')
Reading KEGG annotation online: "https://rest.kegg.jp/link/mmu/pathway"...
Reading KEGG annotation online: "https://rest.kegg.jp/list/pathway/mmu"...
> KEGG_enrich = enrichKEGG(gene = id_transform[,1],
+ organism=kk,
+ use_internal_data = TRUE #这行加或者不加都报错
+ )
Error in (function (cl, name, valueClass) :
assignment of an object of class “NULL” is not valid for @‘keytype’ in an object of class “enrichResult”; is(value, "character") is not TRUE
#我的输入gene list是这样的
# id_transform
#SETBP1 240427
#CITED1 12705
#RIMS4 241770
#MUC12 102633301
#SMO 319757
#CALCB 116903
#TMEM158 72309
返回去查看
> kk@keytype
NULL
按道理来说gson_KEGG创建的gson应该是带有 ENTREZID 的keytype的 (?)
> gson_KEGG
function (species, KEGG_Type = "KEGG", keyType = "kegg")
{
x <- download_KEGG(species, KEGG_Type, keyType)
gsid2gene <- setNames(x[[1]], c("gsid", "gene"))
gsid2name <- setNames(x[[2]], c("gsid", "name"))
version <- kegg_release(species)
gson(gsid2gene = gsid2gene, gsid2name = gsid2name, species = species,
gsname = "KEGG", version = version, accessed_date = as.character(Sys.Date(),
keytype = "ENTREZID"))
}
<bytecode: 0x1f8dcc08>
<environment: namespace:clusterProfiler>
可以请教一下作者原因吗
Hi, a couple of things:
First of all, why do you generate a GSON object with all mouse pathways?
Related to this, please check the help pages on how to call the enrichKEGG function, because you made some mistakes. Note that the argument organism should be the KEGG abbreviation of the organism you are analyzing; in your case thus mmu (and it should NOT be the GSON object!)
The argument gene should be a (character) vector of entrezids.
It is also recommended to leave the argument use_internal_data at its default setting FALSE (so up-to-date information is being downloaded from the KEGG website).
Thus the code below, in which the 7 ids are used that you listed, will do what you intended to do!
> library(clusterProfiler)
>
> id_transform <- c("240427","12705","241770","102633301","319757","116903","72309")
> class(id_transform)
[1] "character"
>
> KEGG_enrich = enrichKEGG(gene = id_transform,
+ organism="mmu",
+ use_internal_data = FALSE
+ )
>
>
> KEGG_enrich
#
# over-representation test
#
#...@organism mmu
#...@ontology KEGG
#...@keytype kegg
#...@gene chr [1:7] "240427" "12705" "241770" "102633301" "319757" "116903" "72309"
#...pvalues adjusted by 'BH' with cutoff <0.05
#...5 enriched terms found
'data.frame': 5 obs. of 11 variables:
$ category : chr "Environmental Information Processing" "Human Diseases" "Organismal Systems" "Organismal Systems" ...
$ subcategory: chr "Signal transduction" "Cancer: specific types" "Circulatory system" "Development and regeneration" ...
$ ID : chr "mmu04340" "mmu05217" "mmu04270" "mmu04360" ...
$ Description: chr "Hedgehog signaling pathway - Mus musculus (house mouse)" "Basal cell carcinoma - Mus musculus (house mouse)" "Vascular smooth muscle contraction - Mus musculus (house mouse)" "Axon guidance - Mus musculus (house mouse)" ...
$ GeneRatio : chr "1/2" "1/2" "1/2" "1/2" ...
$ BgRatio : chr "58/9710" "63/9710" "144/9710" "181/9710" ...
$ pvalue : num 0.0119 0.0129 0.0294 0.0369 0.0416
$ p.adjust : num 0.0388 0.0388 0.0499 0.0499 0.0499
$ qvalue : num 0.00681 0.00681 0.00875 0.00875 0.00875
$ geneID : chr "319757" "319757" "116903" "319757" ...
$ Count : int 1 1 1 1 1
#...Citation
T Wu, E Hu, S Xu, M Chen, P Guo, Z Dai, T Feng, L Zhou, W Tang, L Zhan, X Fu, S Liu, X Bo, and G Yu.
clusterProfiler 4.0: A universal enrichment tool for interpreting omics data.
The Innovation. 2021, 2(3):100141
>
> as.data.frame(KEGG_enrich)[1:3,]
category subcategory ID
mmu04340 Environmental Information Processing Signal transduction mmu04340
mmu05217 Human Diseases Cancer: specific types mmu05217
mmu04270 Organismal Systems Circulatory system mmu04270
Description
mmu04340 Hedgehog signaling pathway - Mus musculus (house mouse)
mmu05217 Basal cell carcinoma - Mus musculus (house mouse)
mmu04270 Vascular smooth muscle contraction - Mus musculus (house mouse)
GeneRatio BgRatio pvalue p.adjust qvalue geneID Count
mmu04340 1/2 58/9710 0.01191138 0.03880464 0.006807832 319757 1
mmu05217 1/2 63/9710 0.01293488 0.03880464 0.006807832 319757 1
mmu04270 1/2 144/9710 0.02944172 0.04989512 0.008753530 116903 1
>
Thank you for your detailed reply. Sorry for not using English before. but do you know why enrichKEGG does not support gson object, I am confused bacause I saw the code below. @guidohooiveld https://github.com/YuLab-SMU/clusterProfiler/blob/2ab30a92f1791dce75f71ea29b71c33fc443d4a0/R/enrichKEGG.R#L45-L58
I added gson_file@keytype <- 'ENTREZID' before running enrichKEGG(), and the error disappeared. But I am not sure whether the results are correct by doing this.
In terms of input data, sorry for showing the wrong data, I showed id_transform before, but I actually used id_transform[,1], which is exactly the character vector. Thank you for pointing out.
> head(id_transform)
id_transform
SP140 434484
SPATA32 328019
SAMD15 238333
FER1L6 631797
RERGL 632971
PHEX 18675
> head(id_transform[,1])
[1] "434484" "328019" "238333" "631797" "632971" "18675"
Sorry for my delayed reply!
Thanks for highlighting the relevant section in the source code from enrichKEGG. I now got what you tried to achieve, and agree with you that the GSON-object kk is somehow missing the keytype slot.
Indeed, when manually adding it (like you did) enrichKEGG works as expected. See code below.
> ## load library
> library(clusterProfiler)
>
> ## some ids
> id_transform <- c("240427","12705","241770","102633301","319757","116903","72309")
>
> ## generate GSON-object with pathway information
> kk <- gson_KEGG('mmu')
>
> ## use GSON as input: FAILS!
> KEGG_enrich = enrichKEGG(gene = id_transform,
+ organism=kk,
+ use_internal_data = FALSE)
Error in (function (cl, name, valueClass) :
assignment of an object of class “NULL” is not valid for @‘keytype’ in an object of class “enrichResult”; is(value, "character") is not TRUE
>
>
> ## check GSON-object
> kk
>> Gene Set: KEGG
>> 9710 genes annotated by 355 gene sets.
>> Species: mmu
>> Version: Release 110.0+/04-27, Apr 24
>
> ## note that slot keytype is NULL!
> str(kk)
Formal class 'GSON' [package "gson"] with 9 slots
..@ gsid2gene :'data.frame': 38640 obs. of 2 variables:
.. ..$ gsid: chr [1:38640] "mmu00010" "mmu00010" "mmu00010" "mmu00010" ...
.. ..$ gene: chr [1:38640] "103988" "106557" "110695" "11522" ...
..@ gsid2name :'data.frame': 355 obs. of 2 variables:
.. ..$ gsid: chr [1:355] "mmu01100" "mmu01200" "mmu01210" "mmu01212" ...
.. ..$ name: chr [1:355] "Metabolic pathways - Mus musculus (house mouse)" "Carbon metabolism - Mus musculus (house mouse)" "2-Oxocarboxylic acid metabolism - Mus musculus (house mouse)" "Fatty acid metabolism - Mus musculus (house mouse)" ...
..@ gene2name : NULL
..@ species : chr "mmu"
..@ gsname : chr "KEGG"
..@ version : chr "Release 110.0+/04-27, Apr 24"
..@ accessed_date: chr "2024-04-30"
..@ keytype : NULL
..@ info : NULL
>
> ## Fix, and check
> kk@keytype="kegg"
>
> str(kk)
Formal class 'GSON' [package "gson"] with 9 slots
..@ gsid2gene :'data.frame': 38640 obs. of 2 variables:
.. ..$ gsid: chr [1:38640] "mmu00010" "mmu00010" "mmu00010" "mmu00010" ...
.. ..$ gene: chr [1:38640] "103988" "106557" "110695" "11522" ...
..@ gsid2name :'data.frame': 355 obs. of 2 variables:
.. ..$ gsid: chr [1:355] "mmu01100" "mmu01200" "mmu01210" "mmu01212" ...
.. ..$ name: chr [1:355] "Metabolic pathways - Mus musculus (house mouse)" "Carbon metabolism - Mus musculus (house mouse)" "2-Oxocarboxylic acid metabolism - Mus musculus (house mouse)" "Fatty acid metabolism - Mus musculus (house mouse)" ...
..@ gene2name : NULL
..@ species : chr "mmu"
..@ gsname : chr "KEGG"
..@ version : chr "Release 110.0+/04-27, Apr 24"
..@ accessed_date: chr "2024-04-30"
..@ keytype : chr "kegg"
..@ info : NULL
>
>
> ## enrichKEGG now works!
> KEGG_enrich = enrichKEGG(gene = id_transform,
+ organism=kk,
+ use_internal_data = FALSE)
>
> KEGG_enrich
#
# over-representation test
#
#...@organism mmu
#...@ontology KEGG
#...@keytype kegg
#...@gene chr [1:7] "240427" "12705" "241770" "102633301" "319757" "116903" "72309"
#...pvalues adjusted by 'BH' with cutoff <0.05
#...5 enriched terms found
'data.frame': 5 obs. of 11 variables:
$ category : chr "Environmental Information Processing" "Human Diseases" "Organismal Systems" "Organismal Systems" ...
$ subcategory: chr "Signal transduction" "Cancer: specific types" "Circulatory system" "Development and regeneration" ...
$ ID : chr "mmu04340" "mmu05217" "mmu04270" "mmu04360" ...
$ Description: chr "Hedgehog signaling pathway - Mus musculus (house mouse)" "Basal cell carcinoma - Mus musculus (house mouse)" "Vascular smooth muscle contraction - Mus musculus (house mouse)" "Axon guidance - Mus musculus (house mouse)" ...
$ GeneRatio : chr "1/2" "1/2" "1/2" "1/2" ...
$ BgRatio : chr "58/9710" "63/9710" "144/9710" "181/9710" ...
$ pvalue : num 0.0119 0.0129 0.0294 0.0369 0.0416
$ p.adjust : num 0.0388 0.0388 0.0499 0.0499 0.0499
$ qvalue : num 0.00681 0.00681 0.00875 0.00875 0.00875
$ geneID : chr "319757" "319757" "116903" "319757" ...
$ Count : int 1 1 1 1 1
#...Citation
T Wu, E Hu, S Xu, M Chen, P Guo, Z Dai, T Feng, L Zhou, W Tang, L Zhan, X Fu, S Liu, X Bo, and G Yu.
clusterProfiler 4.0: A universal enrichment tool for interpreting omics data.
The Innovation. 2021, 2(3):100141
>
As you will see above I opened an issue on the GitHub of the gson package.
https://github.com/YuLab-SMU/gson/issues/9