p value is not generated
Hello,
Sorry if this was addressed before, but I couldn't find the output options to generate p values for the interactions. I am asking because in the web interface of intaRNA, the output has p values for every interaction and I want to filter for significant results otherwise it is too much. Is there a way in the command line version as well to do this? Or do you have a suggestion how to filter the results? (I am looking at a bacterial small RNA and its interactions through its genome, including coding regions and intergenic regions as well).
Thank you very much!
Hi @efisso
thanks for your question.
Eventually, computing p-values is no standard option, since this requires a sound sampling of the possible energy distribution, which is organism and task specific.
Within the webserver, p-values are computed for whole genome target predictions, assuming the set of mfe predictions for all genes to be a reasonable sample. The computation itself is done using our p-value R script, which is part of the package or can be downloaded from github using the given link. The script takes an intarna CSV output and assumes the given data set to be the sound energy sample, as discussed above, to fit a distribution and to compute respective p-values.
Hope that helps and provides sufficient information to understand the limitations of the approach.
Best, Martin
Hi @efisso
thanks for your question.
Eventually, computing p-values is no standard option, since this requires a sound sampling of the possible energy distribution, which is organism and task specific.
Within the webserver, p-values are computed for whole genome target predictions, assuming the set of mfe predictions for all genes to be a reasonable sample. The computation itself is done using our p-value R script, which is part of the package or can be downloaded from github using the given link. The script takes an intarna CSV output and assumes the given data set to be the sound energy sample, as discussed above, to fit a distribution and to compute respective p-values.
Hope that helps and provides sufficient information to understand the limitations of the approach.
Best, Martin
Thank you very much for your fast answer! This is definitely helpful, I will check it out. (I think the best target list would be the transcriptome in this case?)
Best, Elif
In the end it is all a model with its limitations and assumptions. For instance, within the webserver, only regions around the start codons are used, not the transcripts and not the whole mRNA. This allows for an efficient computation since it limits the RNA sequences to be considered but provides "only" a modelling of "transcription start interfering interactions".
So on one side yes: transcriptome is a good idea. On the other side: what kind of interaction are you interested in or do you have already a model of regulation caused by the interaction? this might strongly influence the data set and the time required for the interaction computation.
Hope that helps, best, Martin