Mash suggestion: shared memory

Hi,

I like Mash very much because of its speed. However, if I want to screen multiple fastq-samples for similarity with the bacterial refseq-genomes (+500mb), it takes about 1m for each sample to load in the refseq genomes.

I wonder if this could be resolved by using shared memory like bowtie2 (--shmem) or Star (LoadAndKeep) so that one only need to load in the sketches to who one wants to screen to only once?

Thanx!

Feb 23 '18 14:02 klaas-men

This is definitely an issue we want to address, but what if there were just a flag to treat inputs separately instead of pooling them? That would essentially allow concurrent runs with a shared reference via -p, without worrying about cleaning up the shared memory later.

Feb 24 '18 18:02 ondovb

That is definitely a more simple approach that will improve it. On the other hand, it requires the need for pooling the separate jobs into one job - something that might not always be achievable. e.g. If jobs are run on a central server, but are submitted from various other clients.

Feb 26 '18 08:02 klaas-men

For me the problem is that my input will depend on the output of the previous input. I've been trying to make this work by giving the -l option a named pipe, but so far no success.. Reloading the db again and again takes far too much time..

Mar 09 '21 10:03 sheikki