lobSTR seems to fail when an N is located in the middle of the STR
lobSTR seems to mis-genotype reads (or at least some reads) where the STR (in this case, a poly-G region) has an N in the middle. Is there any way to fix/change this behavior?
Example:
The following read was put into lobSTR (STR bold): TCTCGGCATCAACATCCAGAGTTTAGGGACCATGTCCCAGTCTCTGTGAGGTGGATGGGAAGTCAACATTAGTTGACTGAGCACCACCTGCGTGGAAGATGCAGCCCCCCCCNGCCCCATCACTGGGAATACAGTGCTGAGCAGGACAGCACCTGATGTGCGAGGGGGAAGACAGACAACAAATACATAAGCAATGGAATGTACCTTTGGCAGGCCGAT
The tags attached to the read by lobSTR afterwards are:
XS:i:46990694 XE:i:46990706 XR:Z:C XD:i:-5 XC:f:13 XG:Z:CCCCCCC XX:i:1 XM:i:-1 XQ:i:41 RG:Z:lobSTR;s66;spike_in NM:i:7
I believe this to be incorrectly genotyped, as the tags should be: XS:i:46990694 XE:i:46990706 XR:Z:C XD:i:1 XC:f:13 XG:Z:CCCCCCCCNGCCCC ...
Is there an easy way to fix this behavior?
As a note, we can't use HipSTR for this application because we want a genotype per read, and HipSTR does not allow us to get the level of detail we need for our studies (effectively, pooled populations of cells with unknown population size).
Brendan