Explosion on CIGARS using = and X operators
I was trying out ABRA2 on a set of alignments produced by RTG, which use the = and X CIGAR operators to indicate match/mismatch instead of the generic M operator. These CIGARS make ABRA fail with:
java.lang.UnsupportedOperationException: Unhandled cigar operator: = in: 2195212 : 124= at abra.SAMRecordWrapper.getSpanningRegions(SAMRecordWrapper.java:203) at abra.Feature.findAllOverlappingRegions(Feature.java:126) at abra.ReAligner.processChromosomeChunk(ReAligner.java:314) at abra.ReAlignerRunnable.go(ReAlignerRunnable.java:21) at abra.AbraRunnable.run(AbraRunnable.java:20) at java.util.concurrent.Executors$RunnableAdapter.call(Executors.java:511) at java.util.concurrent.FutureTask.run(FutureTask.java:266) at java.util.concurrent.ThreadPoolExecutor.runWorker(ThreadPoolExecutor.java:1142) at java.util.concurrent.ThreadPoolExecutor$Worker.run(ThreadPoolExecutor.java:617) at java.lang.Thread.run(Thread.java:748)
BTW, this is using the 2.12 release.
Thanks for reporting this. Are you looking for the realigned reads to also use the = X format?
= and X are nice for at-a-glance eyeballing mismatches, but I'm more concerned with having the tool function than whether the output CIGARs preserve the types of operators used. It would probably be fine if adjacent =XM elements were just merged to M up front.
As a workaround I ran one set of mappings through a pre-processor that converted the CIGARS to use the legacy format, and abra seemed to run after I added the following patch:
diff --git a/src/main/java/abra/MultiSamReader.java b/src/main/java/abra/MultiSamReader.java
index e52db0e..9fa66a5 100644
--- a/src/main/java/abra/MultiSamReader.java
+++ b/src/main/java/abra/MultiSamReader.java
@@ -84,6 +84,7 @@ public class MultiSamReader implements Iterable<SAMRecordWrapper> {
return (
(!read.getDuplicateReadFlag()) &&
(!read.getReadFailsVendorQualityCheckFlag()) &&
+ (read.getReadLength() > 0) &&
(read.getMappingQuality() >= this.minMapqForAssembly || read.getReadUnmappedFlag()) &&
SAMRecordUtils.isPrimary(read)); // Was previously an id check, so supplemental / secondary alignments could be included
}
This was required because occasionally one arm of a paired read was quality trimmed to length 0 and reported in the BAM as an unmapped alignment placed with respect to its aligned mate. Without this patch abra would try to assemble the zero length read (where getReadString() returned "*") and explode.