two-step approach with fast pre-filter
Following the results and repeating the benchmark from https://doi.org/10.1093/bib/bby032 we want to test whether a combination of a fast pre-filter tool like guugle will improve IntaRNAs runtime and prediction accuracy.
- run guugle or similar fast method to identify potential interactions
- combine interaction ranges to predictions ranges (eg on target only)
- run intarna on pre-filtered ranges only
guugle-based with seed-extension
- ASSA https://doi.org/10.1142/S0219720018400012
prefilter query sequences using its accessibility profile (precomputed or via RNAplfold or IntaRNA - filter matches based on maximal GU/AU content (check statistics on seeds of genomewide screens) - generate target region list - evaluate list features (length distribution, (estimated/modelled) time for ED computation etc) and compare to full target set