suggestion: shared memory
Hi,
I like Mash very much because of its speed. However, if I want to screen multiple fastq-samples for similarity with the bacterial refseq-genomes (+500mb), it takes about 1m for each sample to load in the refseq genomes.
I wonder if this could be resolved by using shared memory like bowtie2 (--shmem) or Star (LoadAndKeep) so that one only need to load in the sketches to who one wants to screen to only once?
Thanx!
This is definitely an issue we want to address, but what if there were just a flag to treat inputs separately instead of pooling them? That would essentially allow concurrent runs with a shared reference via -p, without worrying about cleaning up the shared memory later.
That is definitely a more simple approach that will improve it. On the other hand, it requires the need for pooling the separate jobs into one job - something that might not always be achievable. e.g. If jobs are run on a central server, but are submitted from various other clients.
For me the problem is that my input will depend on the output of the previous input. I've been trying to make this work by giving the -l option a named pipe, but so far no success.. Reloading the db again and again takes far too much time..