Feature request: support deduplication based on [contig]:[R1 coordinate] only
I am glad to see this tool for generating consensus reads. Unfortunately it does not work with some types of data. For example, where library prep chemistry allows for multiple read pairs to be read off the same fragment with differing template lengths (differing R1-R2 spans). One example of such chemistry is Anchored Multiplex PCR: https://www.nature.com/articles/nm.3729
To do this you would cluster on the start position of the R1 but not include the right_pos of the pair. Is this something that could be easily supported via a command line flag?
I upvote this. If I understand correctly, it would make gencore behave the same as Picard for single end reads
I would also be very interested in this, my data has slightly different template lengths, so I would like to make consensus reads with respect to just the UMI and R1.
I agree, this would be great. I'm working with PacBio sequencing of a short sequence with barcoded (UMI) variants and think this tool would be great to create a consensus sequence of the variant for each barcode to create a barcode-variant look up table.