scores and p-values
Apologies, wasn't sure if this was the best place to post, but:
- Is there a way of extracting a column of p-values along with the positions?
- Is there documentation on what the score represents and what is an acceptable threshold?
Hi @mistrm82, this is a fine place to post.
Re 1 -- no, there is only the option to extract scores with positions:
motif_pos <- matchMotifs(example_motifs, peaks, genome = "hg19",
out = "positions")
Re 2 -- This is a port of the MOODS C++ package (https://github.com/jhkorhonen/MOODS) so the documentation and/or papers for that package might be useful (e.g. https://ieeexplore.ieee.org/document/4803829/?reload=true, https://academic.oup.com/bioinformatics/article/25/23/3181/215705, https://www.cs.helsinki.fi/group/pssmfind/)
In terms of the p-value vs. score, the package finds the score threshold that would correspond to a certain p-value (in terms of the probability of a random sequence having a score that high). It does not then find the p-value for each potential motif site.
Thanks @AliciaSchep . So what you are saying is that p-values are not derived for each individual site? In that case, I wouldn't need the p-values since I was assuming each as an independent test and planning on performing multiple test correction.
I'll take at those links to get a better feel for the score values.
For anyone else looking for this information take a look at the following links:
https://github.com/jhkorhonen/MOODS/issues/12#issuecomment-405912018
https://github.com/jhkorhonen/MOODS/wiki/Brief-theoretical-introduction
It would be very helpful to include direct links to some of these pages in the documentation, or a simple description in the package help pages themselves, as it is kind of difficult to find a clear explanation. Reading the papers isn't sufficient because I couldn't figure out which number was being reported as the score by the software until digging through these github issues.