Metabuli for eukaryotes: use cases and future plans.
Metabuli project was started targeting prokaryotes and viruses. However, since we are hearing use cases for eukaryotes and some promising performance for user side, we are planning to optimize default settings or to add some parameters for eukaryotes. Providing a pre-built database covering both eukaryotes and prokaryotes is also listed in the to-do list.
Here are some cases of Metabuli with eukaryotes.
-
Environmental DNA metabarcoding for surveying marine vertebrate (benchmarks) Metabuli showed promising performance in classifying simulated 12S and 16S amplicon data of marine vertebrates Working parameters:
--seq-mode 1 --min-cons-cnt-euk 4 --tie-ratio 0.99 -
Test Metabuli for fungi. With
--min-cons-cnt-euk 4, Metabuli correctly classified 97% of paired-end reads simulated from a fungal species when its genome is included in DB. But the percentage was dropped to 12% with the default setting (--min-cons-cnt 9).
For now, --min-cons-cnt-euk is thought to be a critical parameter.
It determines the minimum number of consecutive k-mer hits to be classified.
The strict default value of --min-cons-cnt-euk 9 was decided on older version of Metabuli as a quick remedy to reduce false positive eukaryote hits resulted by their larger genomes.
Even though we added noise filtering steps to reduce the false positives, we didn't tweak the value for eukaryotes.
Based on the user's report, setting --min-cons-cnt-euk as lower value like 4 or 5 would be good for now.
After some tests, we will make a new releases with an optimized default value.
+++ Please share your thoughts on how and what to optimize Metabuli for eukaryotes! It helps us a lot to make Metabuli more useful for your research.