MashMap icon indicating copy to clipboard operation
MashMap copied to clipboard

mashmap does not give results if input sequences are too short

Open b-brankovics opened this issue 5 years ago • 1 comments

Dear Developers,

I am using mashmap to mine homologous regions for my reference genes from genomes, and I have encountered a bug in the program. If one or both of the input sequences is shorter than a specific length then the program appears to run and exits with exit code 0, but does not produce any plots.

When using the following command:

mashmap -s 500 -r ref.fas -q target.fas -o output.mash

The ref.fas had to be at least 16100 bp long and the target.fas had to be at least 510 bp otherwise there was nothing in the output.

For me it would be already great if mashmap returned a different exit code than 0 in this case, because than I know that it failed because of input requirements and doesn't mean there are no homologous sequences.

b-brankovics avatar Oct 01 '20 13:10 b-brankovics

Yes, that is governed more or less by the algorithm. -s 500 indicates it will look for mappings of 500-long bp fragments from read to the reference by using Jaccard similarity of k-mers within them. It would not work if either query and reference are shorter. Because this is an approximate method, it is a bit tricky to differentiate b/w the two scenarios.

cjain7 avatar Oct 24 '20 06:10 cjain7