createKEGGdb icon indicating copy to clipboard operation
createKEGGdb copied to clipboard

Compatibility of createKEGGdb with keyType option of clusterProfiler::enrichKEGG function

Open thegrebe opened this issue 2 years ago • 0 comments

Hello,

Thanks for this useful package!

I have some questions on what exactly is stored in the resulting KEGG.db, and how that relates to the options of clusterProfiler::enrichKEGG. enrichKEGG has an option keyType, which accepts kegg, ncbi-geneid, ncbi-proteinid or uniprot.


Background/context

I would like to have a solution for doing KEGG enrichment analysis, starting from gene SYMBOL. I want to be able to use the same solution from any arbitrary species.

From this reply https://github.com/YuLab-SMU/clusterProfiler/issues/108#issuecomment-336784558

KEGG id and ENTREZID are the same for only some of the species, but not always the same.

and this blog post https://guangchuangyu.github.io/2016/05/convert-biological-id-with-kegg-api-using-clusterprofiler/

A rule of thumb for the ‘kegg’ ID is entrezgene ID for eukaryote species and Locus ID for prokaryotes.

I conclude that kegg id are not reliable enough/not sufficiently well described for my use. I would thus prefer to use ncbi-geneid.


However, when opening the sqlite database created through createKEGGdb, I only see a field gene_or_orf_id in table pathway2gene.

Questions:

  • what is the gene_or_orf_id present in the KEGG.db database? Is it a kegg id?
  • can I use createKEGGdb to create a KEGG.db package, and then use it for clusterProfiler::enrichKEGG with keyType = ncbi-geneid (and use_internal_data = TRUE)

Than you in advance for your help, All the best

thegrebe avatar Mar 15 '23 16:03 thegrebe