TreeSAPP icon indicating copy to clipboard operation
TreeSAPP copied to clipboard

Replace BMGE with ClipKit

Open cmorganl opened this issue 5 years ago • 1 comments

ClipKIT is a new MSA-trimming Python package. The authors indicate the trimmed MSAs generated by ClipKIT are more "desireable" (combined RF distance and bipartition supports) than those from competing tools, including BMGE.

Using ClipKit instead of BMGE would also clean up the installation process, by not having to package the BMGE.jar file with TreeSAPP. It could instead be installed using pip or conda.

  • [x] Write ClipKit helper class for running facilitating trimming of a fasta file
  • [ ] Determine optimal parameters to use by comparing classification performance of ClipKIT (gappy, kpic and kpi modes) to BMGE and raw MSA. Evaluation dataset is EggNOG v5.0 against functional and phylogenetic marker reference packages.
  • [x] Remove treesapp/sub_binaries/ directory, and support for BMGE.jar
  • [ ] Add ClipKit to requirements.txt and conda recipe

cmorganl avatar Jan 08 '21 15:01 cmorganl

ClipKit parameters and settings have been benchmarked using treesapp evaluate. The following code is used to calculate a single error value for the classifications across all taxonomic ranks, weighted by the number of ranks to the correct taxon (i.e. taxonomic distance):

for f in *_evaluate*/final_outputs/clade_exclusion_performance.tsv
    do
    echo $f
    cat $f | awk '{sum+=$5*$7;} END {print sum;}'
done

The parameter set with the lowest score will be used as the default.

cmorganl avatar Jun 10 '22 17:06 cmorganl