dsk icon indicating copy to clipboard operation
dsk copied to clipboard

inconsistent result in comparison with jellyFish and kmer_counter

Open YanaHrytsenko opened this issue 5 years ago • 3 comments

Correction: Actually, all three produced different results.


Hello, I compared the outputs from DSK to the ones generated by JellyFish and by KMER_COUNTER. I used the same .fa file for all three and generated the 7 mers by each of the packages. While all three produced the same number of 7-mers (8192 counts), only JellyFish and KMER_COUNTER produce identical k-mer profiles (i.e. the same k-mers and their frequencies). However, DSK is different by 1344 kmers from both of them. All kmers were sorted lexicographically and I used set difference to calculate the results. Since two out of three produced the same results, I was wondering if there is anything DSK does differently? I know how DSK counts canonical k-mers and tried to search by reversed k-mer string but still, the output isn't there. Could you please let me know if there is something I am missing, perhaps in the flag setting? Thank you.

YanaHrytsenko avatar Nov 04 '20 01:11 YanaHrytsenko

Hi, thanks for bringing this up. Does it only occur with 7-mers or did you also see it with higher lengths? e.g. 21-mers. I must say I almost never test with that small k-mer sizes.

rchikhi avatar Nov 04 '20 10:11 rchikhi

Hi, no I only checked 7-mers because I needed this value but the way I understand it should not depend on the k-mer length. Even if the same sequence is analyzed by different software, there should not be any inconsistencies as to the frequencies of k-mers present in a sequence for a given value of k. Isn't that true? Thank you.

YanaHrytsenko avatar Nov 04 '20 16:11 YanaHrytsenko

hi, keep in mind that DSK and Jellyfish do not normalize kmers the same way. See: https://github.com/GATB/dsk/#kmers-and-their-reverse-complements Also, DSK discards by default any kmer seen only once, and you can modify that behavior by giving as parameter: -abundance-min 1. If the issue remains, I'd appreciate to have a small test file to further debug it.

rchikhi avatar Jan 08 '21 16:01 rchikhi