Winnowmap icon indicating copy to clipboard operation
Winnowmap copied to clipboard

No -I option

Open soisa001 opened this issue 2 years ago • 7 comments

When running winnowmap, the -I option is not recognized. e.g. after generating the repetitive_k15.txt with meryl:

winnowmap -W repetitive_k15.txt -a -x map-pb -Y -L --eqx --cs -I 32G ref.fa.gz reads.fastq.gz | samtools view -hb | samtools sort -@8 > alignment_sorted.bam

Yields the following error:

[ERROR] unknown option in "-I"

The -I option is needed for a multi-part index. Thanks.

soisa001 avatar Aug 19 '23 03:08 soisa001

Sorry, multi-part indexing is not supported yet.

cjain7 avatar Aug 21 '23 04:08 cjain7

Hi @cjain7

Does this mean that it's not possible to map to genomes larger than 4G while getting accurate mapQs? For minimap2 this would be the case in the absence of the -I flag.

Thanks!

diego-rt avatar Nov 16 '23 15:11 diego-rt

I saw this change was added post the last v2.0.3 release version so the condo-installed versions allow using the -I option. I do see slight differences in alignments when increasing -I on genomes w/>4gb genome size. I wanted to confirm if it is safe to use this option assuming no saved index is used or was it removed because it was not working correctly in v2.0.3 as well?

skoren avatar May 23 '24 11:05 skoren

Hi Sergey, I looked at this now; sorry for the delay in responding. Your question is best answered at the minimap2 help page https://lh3.github.io/minimap2/minimap2.html

Increasing the -I value will help you get slightly more accurate alignments because having the entire reference is helpful to identify the best alignment for a read, and also for computing the mapping qualities. In my view, -I option should not be given to the user during read-to-genome mapping. If it is provided, it is best to ensure that the value is more than the genome size. Most likely, this was the reason why I omitted -I from the development code.

My guess is that minimap2 has -I parameter because it is also used as a read overlapper, and for mapping reads to very large reference databases. Even then, having -I is sub-optimal but it is necessary to control RAM usage.

cjain7 avatar Jun 09 '24 08:06 cjain7

The issue is the default -I is only 4gb so even a diploid human genome is too big and we'd want to increase -I (in fact we do when mapping for all our T2T analysis to both haplotypes: https://github.com/arangrhie/T2T-Polish/blob/master/winnowmap/map.sh). There are also much larger genomes (see https://github.com/marbl/verkko/issues/252) which is what made me start looking into this. In these cases of large references it sounds like the -I option would be important to set to be larger than the genome size so it'd be nice to keep it available in future releases since it is being used.

skoren avatar Jun 10 '24 14:06 skoren

Sorry to jump in, but as a heavy user of giant genomes (30 Gbp and more), I think it is absolutely indispensable to have the -I option enabled.

diego-rt avatar Jun 10 '24 15:06 diego-rt

Understood, thank you! The -I option is now back :)

cjain7 avatar Jun 13 '24 12:06 cjain7