NextPolish icon indicating copy to clipboard operation
NextPolish copied to clipboard

Polish only INDELs or only SNPs, and polishing thresholds

Open mmontonerin opened this issue 3 years ago • 1 comments

Hi, I have tried NextPolish, and oveall I am happy with it, but I miss a bit pore possibilities to select what to polish in order to trust what is doing to the de novo genome assemblies I am working with.

One functionality that I feel I miss in NextPolish is the possibility to fix either only INDELs or only SNPs, depending on the type of data that is being used. For example, I have a set of short reads that I would want to use to only correct INDELs, as many SNPs could be just normal heterozygous sites, in different proportions in different datasets.

I also miss the possibility to be a bit more conservative in polishing, and be able to select a certain depth or quality threshold for a position to be polished.

Do you plan to implement any of these functionalities in the future?

mmontonerin avatar Jun 13 '22 08:06 mmontonerin

Hi, first, thank you for your good suggestions. However, SNP and INDEL are hard to distinguish for NextPolish, because NextPolish correct error-bases using kmers, so NextPolish does not distinguish between SNP and INDEL. For heterozygous kmer, NextPolish selects the kmer with the most counts as the corrected kmer.

BTW, I will consider your suggestion and maybe add some extra functions/parameters in the future.

moold avatar Jun 14 '22 01:06 moold