Linking LTRs and internal regions for ERV: Further postprocessing necessary?
Dear all,
I am looking into ERVs in the mouse genome (GRCm39), and I am a bit confused about relevant post processing steps.
There is a script available to combine ERV LTRs with internal regions, based on the names of the elements (https://mobilednajournal.biomedcentral.com/articles/10.1186/1759-8753-5-13). However, I am not sure if this is (still?) required, or if ProcessRepeats does this already.
Here is one example from the .out file in which LTRs and internal regions were linked already via the ID column:
532 6.9 8.9 4.2 1 3122749 3123277 (192031002) + ERVB4_2-LTR_MM LTR/ERVK 1 553 (0) 98
689 5.8 9.9 0.8 1 3123278 3124349 (192029930) + ERVB4_2-I_MM LTR/ERVK 1 1972 (6402) 98
162 3.5 0.0 0.0 1 3124347 3124487 (192029792) + ERVB4_2-I_MM LTR/ERVK 5825 5965 (2409) 98 *
2116 6.8 5.8 0.8 1 3124478 3126963 (192027316) + RLTR45-int LTR/ERVK 1318 3204 (4040) 99
626 2.6 2.9 2.2 1 3126958 3127550 (192026729) + RLTR45-int LTR/ERVK 3390 3986 (3258) 99 *
2455 3.3 0.6 0.2 1 3127544 3129715 (192024564) + RLTR45-int LTR/ERVK 5065 7244 (0) 99
544 6.3 8.7 4.5 1 3129716 3130247 (192024032) + ERVB4_2-LTR_MM LTR/ERVK 1 553 (0) 99
In this particular case, joining by name would not even catch it.
ProcessRepeats seems to do something with LTRs/ints. Are there cases that ProcessRepeats might miss, which may benefit from further parsing?