Yleaf icon indicating copy to clipboard operation
Yleaf copied to clipboard

samtools mpileup use -B option to keep original BAQ of reads

Open zmaroti opened this issue 1 year ago • 0 comments

This is a very minor issue which usually cause no differences, however it hinders comparability on large data sets.

The problem only affects CRAM files, as the mpileup in this case need to be run with the -f reference_fasta option (at line 821 cmd += f" -f {str(reference)}").

If you look at mpileup specification in this case the on-the-fly BAQ recalculation is turned on:

BAQ is turned on when a reference file is supplied using the -f option. To disable it, use the -B option.

Accordingly, the BQ values could be be slightly different for some reads when you create an mpileup from the BAM or CRAM files of the same data. This could lead to slightly different filtration of retained bases for these pileups due to the -Q option. In reality the difefrences are very slight and should not affect the Hg classification in most cases. However this could lead to differences in private markers, and also the identified/missing markers between different data sources of the same data.

Please add the -B option to the mpileup parameters, so the classification will be not affected by the source of data (BAM or CRAM files): cmd += f" -ABQ{quality_thresh}q1 {str(bam_file)} > {str(pileupfile)}"

thanks

zmaroti avatar Feb 12 '25 13:02 zmaroti