RcisTarget icon indicating copy to clipboard operation
RcisTarget copied to clipboard

RcisTarget::addSignificantGenes error

Open joel-tuberosa opened this issue 3 years ago • 5 comments

Hello,

I would like to perform an enrichment analysis with the following data:

target_genes - a vector of gene names corresponding to the tested set

motif_rankings - the loaded database mm9-tss-centered-5kb-10species.mc9nr.genes_vs_motifs.rankings.feather downloaded from here

motifAnnotations_mgi - annotation data loaded from the package with data(motifAnnotations_mgi)

I am running the following commands:

motifs_AUC <- calcAUC(target_genes, motif_rankings)
motifEnrichmentTable <- addMotifAnnotation(motifs_AUC,  motifAnnot=motifAnnotations_mgi)
motifEnrichmentTable_wGenes <- addSignificantGenes(motifEnrichmentTable, 
                                                   geneSets=target_genes,
                                                   rankings=motif_rankings, 
                                                   nCores=1,
                                                   method="aprox")

And I got this error message from the last command:

Error in data.frame(row.names = motifNames, rankings[, geneSet]) : 
  duplicate row.names: 16388, 13294, 17330, 17112, 4188, 16530, 16844, 18101, 17737, 18186, 16084, 18886, 11338, 12655, 16219, 18026, 15061, 16371, 14701, 17214, 18246, 16884, 14225, 6681, 18323, 17761, 17628, 16022, 17015, 18869, 15726, 16565, 16104, 14604, 16384, 15421, 16625, 16326, 15902, 17124, 18335, 18696, 9916, 15847, 14092, 17177, 15993, 17593, 16026, 18152, 14512, 16552, 16644, 19879, 18012, 17748, 18443, 16515, 17100, 17378, 17796, 19198, 18076, 16489, 18470, 14162, 17199, 18253, 16231, 17396, 18081, 17258, 15458, 17295, 15894, 17249, 18312, 17144, 13580, 8484, 16764, 15581, 12946, 19774, 15787, 18527, 18199, 18438, 17575, 17425, 16641, 11742, 18372, 17682, 16088, 17187, 15967, 18070, 17644, 14814, 14675, 17816, 18090, 14718, 17172, 14284, 18289, 18512, 16494, 17723, 15823, 18852, 14540, 17799, 15400, 11594, 17008, 18074, 16253, 17293, 18373, 14628, 13187, 18236, 14654, 17097, 16927, 15662, 11932, 17926, 18632, 18596, 17650, 17991, 17725, 16096, 16249, 10919, 17093, 1

Do you have an idea how to fix this?

Thank you in advance.

Joël

joel-tuberosa avatar Aug 16 '22 16:08 joel-tuberosa

on 17 Aug

I encountered the same problem, and after reviewing the source code I found that the problem was "motif_rankings-mm9-tss-centered-5kb-10species.mc9nr.genes_vs_motifs.rankings.feather". It has a lot of duplicate motif names in it. The solution is to use the old file named "mm9-tss-centered-10kb-10species.mc9nr.feather" from https://resources.aertslab.org/cistarget/databases/old/mus_musculus/mm9/refseq_r45/mc9nr/gene_based/

ZYT-ZhangYunTao19941116 avatar Oct 15 '22 11:10 ZYT-ZhangYunTao19941116

@joel-tuberosa, did you get anywhere with this other than changing to an old annotation file? I am having the exact same error.

davidsanin avatar Apr 19 '23 21:04 davidsanin

I just changed to the old file and then everything went well

发自我的iPhone

------------------ Original ------------------ From: DavidS @.> Date: Thu,Apr 20,2023 5:50 AM To: aertslab/RcisTarget @.> Cc: ZYT-ZhangYunTao19941116 @.>, Comment @.> Subject: Re: [aertslab/RcisTarget] RcisTarget::addSignificantGenes error(Issue #27)

@joel-tuberosa, did you get anywhere with this other than changing to an old annotation file? I am having the exact same error.

— Reply to this email directly, view it on GitHub, or unsubscribe. You are receiving this because you commented.Message ID: @.***>

ZYT-ZhangYunTao19941116 avatar Apr 20 '23 00:04 ZYT-ZhangYunTao19941116

I think I know what is the problem: the new and old version of the databases have a the column where the names of the motifs are stored in differen positions. In old databases it is the first position (colum name 'features', while in the new ones it is at the end (column name 'motifs').

Unfortunately, the code for 03_addSignificantGenes.R assumes that the first column contains the motif names (my comments):

.getSignificantGenes <- function(geneSet,
                                 rankings,
                                 signifRankingNames=NULL,
                                 method="iCisTarget",
                                 maxRank=5000,
                                 plotCurve=FALSE,
                                 genesFormat=c("geneList", "incidMatrix"),
                                 nCores=1,
                                 digits=3,
                                 nMean=50)
{...
  # the motifRankings S4 object becomes a dataframe
  rankings <- getRanking(rankings)
  # the 'indices' are obtained from the FIRST column!!!
  indexCol <- colnames(rankings)[1]
  ...
  # this will give you now a series of ranking values, as character... and not necessarily unique
  motifNames <- as.character(unlist(rankings[,indexCol]))
  # now you get repeated row.names as you have a list of numbers instead of unique motif names:
  gSetRanks <- data.frame(row.names=motifNames, rankings[,geneSet])
  # and this is where the error originates
  ...
}

I think this was intended to be handled before, within importRankings, where it does:

indexCol <- intersect(allColumns, c('motifs', 'tracks', 'features'))#  [1]
if(verbose) message("Using the column '", indexCol, "' as feature index for the ranking database.")

So in principle it is independent of position, but indexCol is not passed on to cisTarget, I think, and also it is clear from the comment that the motifName information is expected to be at the beginning of the dataframe.

However, I do not get the intended results from this message when I run importRankings. I have been using the Drosophila motifRankings, both "new" and "old". When I import them I get, with the old, the expected message:

> motifRankings_old <- importRankings("resources/motifdbs/old/dm6-5kb-upstream-full-tx-11species.mc8nr.feather")
Using the column 'features' as feature index for the ranking database.

But with the new, I get:

> motifRankings_new <- importRankings(".../.../dm6-5kb-upstream-full-tx-11species.mc8nr.genes_vs_motifs.rankings.feather")
Using the column '128up' as feature index for the ranking database.

'128up' is the name of the first Drosophila gene by alphanumeric ordering... but this cannot be the result of intersect(allColumns, c('motifs', 'tracks', 'features'))... I must be missing something ¯\_(ツ)_/¯

Anyway, the solution is to place the last column of the new database at the beginning before running cisTarget:

motifRankings_new@rankings <- dplyr::relocate(motifRankings_new@rankings, motifs)

Hope this helps.

jdenavascues avatar Apr 25 '23 14:04 jdenavascues

Anyway, the solution is to place the last column of the new database at the beginning before running cisTarget:

motifRankings_new@rankings <- dplyr::relocate(motifRankings_new@rankings, motifs)

This does it! Thanks for the advice!

davidsanin avatar Apr 25 '23 15:04 davidsanin