duplex-tools icon indicating copy to clipboard operation
duplex-tools copied to clipboard

question on split pairs

Open caonetto opened this issue 2 years ago • 3 comments

When running split pairs on pod5s and the basecalled sam, duplex tools generates a new folder with split pod5 files and associated read ids. Does this folder contain all the reads that where split, including non-duplex reads, or just the ones that where identified as duplex?

caonetto avatar Apr 17 '23 03:04 caonetto

Hi @caonetto, It will only contain the reads which were identified as being duplex. The split point has to be somewhere in the middle (~45-55% into the read, counting in bases) for it to be identified as duplex (and for it to be split). You can change these thresholds: https://github.com/nanoporetech/duplex-tools/blob/master/duplex_tools/split_pairs.py#L18

It doesn't work as a generic read splitter though, so you will not be getting non-duplex reads from it. Hope that answers the question!

ollenordesjo avatar Apr 17 '23 16:04 ollenordesjo

Thank you for your quick response. Do you have a recommendation on how to split chimeric reads with midstrand adapters from dorado basecalled reads?

Thanks!

caonetto avatar Apr 18 '23 02:04 caonetto

Yes, if you are ok with splitting reads in base-space (having input fastq & output fastq), then this tool should work for that: https://github.com/nanoporetech/duplex-tools/blob/master/duplex_tools/split_on_adapter.py. Feel free to give it a go and let me know if that is sufficient for the use case.

ollenordesjo avatar Apr 18 '23 08:04 ollenordesjo