question on split pairs
When running split pairs on pod5s and the basecalled sam, duplex tools generates a new folder with split pod5 files and associated read ids. Does this folder contain all the reads that where split, including non-duplex reads, or just the ones that where identified as duplex?
Hi @caonetto, It will only contain the reads which were identified as being duplex. The split point has to be somewhere in the middle (~45-55% into the read, counting in bases) for it to be identified as duplex (and for it to be split). You can change these thresholds: https://github.com/nanoporetech/duplex-tools/blob/master/duplex_tools/split_pairs.py#L18
It doesn't work as a generic read splitter though, so you will not be getting non-duplex reads from it. Hope that answers the question!
Thank you for your quick response. Do you have a recommendation on how to split chimeric reads with midstrand adapters from dorado basecalled reads?
Thanks!
Yes, if you are ok with splitting reads in base-space (having input fastq & output fastq), then this tool should work for that: https://github.com/nanoporetech/duplex-tools/blob/master/duplex_tools/split_on_adapter.py. Feel free to give it a go and let me know if that is sufficient for the use case.