Taxonomic profiles
It would be beneficial to output a table with the relative and absolute abundances of taxa based on individual reference packages. This should be in long-table (tidy) format for simple integration with the tidyverse and other new visualization tools.
An example table could be:
| Taxon | Lineage | Rank | RefPkg | Count | Relative_abundance |
|---|---|---|---|---|---|
| Euryarchaeota | Root; Archaea; Euryarchaeota | Phylum | McrA | 28 | 0.03 |
| Archaea | Root; Archaea | Kingdom | McrA | 56 | 0.06 |
Eventually this multiple marker genes, such as universal single-copy markers, should be used together to produce better estimates of abundance.
This feature should be written to report summaries data at multiple scales - from SAGs and MAGs all the way to metagenomes.
When profiling SAGs and MAGs - this should essentially function with CheckM. The only thing standing in the way here is a critical number of reference packages: we have far too few reference packages for universal single copy marker genes.
Another requirement is defining the set of reference packages relevant to different clades.