.fastq and .maf are not consistent
I want to generate reads with 5000 long and 0.85 accuracy, so I used following command in linux:
pbsim --data-type CLR --depth 20 --model_qc /usr/share/pbsim/models/model_qc_clr --length-min 5000 --length-max 5000 --accuracy-min 0.85 --accuracy-max 0.85 e-coli_genome.fasta
However, the even reads in .fastq file and .maf file are not the same, while the odd reads are consistent.
What's the problem?
PBSIM does this thing where reverse compliment reads are not random, but rather evens are reverse-complimented. It looks like the maf file indicates that the read (S1_2 for example) should be reversed (-), but on inspection it looks like it is still forward with respect to both the sample reference and the provided genome. I'm out of town at the moment but I'll try to look at this in a few days. If you are interested in taking a stab at it feel free though. My guess is that there is some code to handle the reverse complimenting and that is not being called properly, it should also be verified that the errors are applied correctly after the reverse-compliment.
Also we should integrate https://wiki.nci.nih.gov/display/TCGA/Mutation+Annotation+Format+%28MAF%29+Specification#MutationAnnotationFormat(MAF)Specification-MAFfilechecks into the github just to ensure that the generated mutation alignments are valid according to the current spec