gencore icon indicating copy to clipboard operation
gencore copied to clipboard

Feature request: support deduplication based on [contig]:[R1 coordinate] only

Open ijhoskins opened this issue 5 years ago • 3 comments

I am glad to see this tool for generating consensus reads. Unfortunately it does not work with some types of data. For example, where library prep chemistry allows for multiple read pairs to be read off the same fragment with differing template lengths (differing R1-R2 spans). One example of such chemistry is Anchored Multiplex PCR: https://www.nature.com/articles/nm.3729

To do this you would cluster on the start position of the R1 but not include the right_pos of the pair. Is this something that could be easily supported via a command line flag?

ijhoskins avatar Jul 08 '20 19:07 ijhoskins

I upvote this. If I understand correctly, it would make gencore behave the same as Picard for single end reads

TomaszSuchan avatar Aug 16 '20 08:08 TomaszSuchan

I would also be very interested in this, my data has slightly different template lengths, so I would like to make consensus reads with respect to just the UMI and R1.

SPPearce avatar Nov 23 '20 16:11 SPPearce

I agree, this would be great. I'm working with PacBio sequencing of a short sequence with barcoded (UMI) variants and think this tool would be great to create a consensus sequence of the variant for each barcode to create a barcode-variant look up table.

mel9320107 avatar Nov 22 '21 21:11 mel9320107