meryl icon indicating copy to clipboard operation
meryl copied to clipboard

feature request: filter reads with kmers from bam file?

Open kevfengler227 opened this issue 2 years ago • 6 comments

Would it be possible to add a feature to run meryl-lookup exclude/include on a BAM instead of a fasta and output BAM? This would be very useful for filtering reads from PacBio or ONT data in their original BAM format without going through FASTA intermediary. Or at least just output a list of reads from the fasta instead of generating the filtered fasta?

Thanks, KF

kevfengler227 avatar Dec 11 '23 20:12 kevfengler227

To that end, does meryl-lookup find homo-polymer compressed kmers in the reads when the database is made with compressed kmers?

kevfengler227 avatar Dec 12 '23 16:12 kevfengler227

It appears that is does not. for removal of long reads this may be very beneficial.

kevfengler227 avatar Dec 13 '23 15:12 kevfengler227

Both are excellent suggestions, and the tools are in dire need of a refresh. We'll (hopefully) get it done late winter/early spring.

BAM support shouldn't be too hard.

Compressed kmer support needed a bit more engineering effort than I wanted to put into the current version, but will definitely be in the next version.

brianwalenz avatar Dec 13 '23 16:12 brianwalenz

Thanks! Even in it's current form, meryl is a godsend for identifying unique kmers from a target sequence and removing reads that contain those kmers. But these two enhancements would be awesome.

kevfengler227 avatar Dec 13 '23 17:12 kevfengler227

I would still be very much interested in these two enhancements. Looking forward to next version.

kevfengler227 avatar Jun 12 '24 18:06 kevfengler227