Autometa icon indicating copy to clipboard operation
Autometa copied to clipboard

Animations in methods documentation

Open evanroyrees opened this issue 4 years ago • 8 comments

Autometa Methods Documentation Video Series

This is a series of videos with voiceovers visualizing describing Autometa methods with visual aides generated by manim animations.

📝:movie_camera: :speaker:🎨 Scripts / Scenes / Animations / Voiceovers 📝 :movie_camera::speaker:🎨

Video Overview

  • Length Filtering
  • Coverage Calculation
  • ORF Calling
  • Marker annotation
  • Taxon Assignment
  • K-mer counting
  • K-mer embedding
  • 3-Dimensions of Clustering Features
  • Binning with Recursive DBSCAN
  • Unclustered Recruitment

Video 1 - Length filtering

  • [x] Script
  • [ ] Animations
  • [ ] Voiceover
  • [ ] Editing

Video 2 - Coverage calculation

  • [x] Script
  • [ ] Animations
  • [ ] Voiceover
  • [ ] Editing

Video 3 - ORF calling

  • [x] Script
  • [ ] Animations
  • [ ] Voiceover
  • [ ] Editing

Video 4 - Marker annotation

  • [ ] Script
  • [ ] Animations
  • [ ] Voiceover
  • [ ] Editing

Video 5 - Taxon assignment

  • [ ] Script
  • [ ] Animations
  • [ ] Voiceover
  • [ ] Editing

Video 6 - K-mer counting

  • [ ] Script
  • [ ] Animations
  • [ ] Voiceover
  • [ ] Editing

Video 7 - K-mer embedding

  • [ ] Script
  • [ ] Animations
  • [ ] Voiceover
  • [ ] Editing

~~Video 8 - 3 dimensions of clustering features~~

  • [ ] ~~Script~~
  • [ ] ~~Animations~~
  • [ ] ~~Voiceover~~
  • [ ] ~~Editing~~

Video 9 - Binning with recursive DBSCAN

  • [ ] Script
  • [ ] Animations
  • [ ] Voiceover
  • [ ] Editing

Video 10 - Unclustered recruitment

  • [ ] Script
  • [ ] Animations
  • [ ] Voiceover
  • [ ] Editing

NOTE: Manim has been forked and is being maintained in two separate repositories.

From the ManimCommunity/manim repository:

This fork is updated more frequently than his, and it's recommended to use this fork if you'd like to use Manim for your own projects.

Example Scenes from Professor Jason Kwan's ASP presentation

evanroyrees avatar Jun 07 '21 23:06 evanroyrees

I've started a new animations repo at https://github.com/jason-c-kwan/Autometa_animations. Can we make the above list into a checklist?

So far I've made a sort of logo animation that can go at the beginning of each video.

jason-c-kwan avatar Jun 10 '21 20:06 jason-c-kwan

For the first video, I think I will show the BHtSNE graph for the same dataset as we change the length cutoff, while coloring the points based on the ground truth. @WiscEvan can you write instructions here on how I would use the Autometa entrypoints to basically do K-mer counting on all contigs, then do normalization and BHtSNE on different subsets? I am thinking I could programmatically chop up an internal Pandas table, but I just need a quick reminder of how to incorporate the BH-tSNE part into the script.

jason-c-kwan avatar Jun 11 '21 15:06 jason-c-kwan

k-mer counting

Subset kmers then normalize, embed and write to sample size filepath

#!/usr/bin/env python
# Save to subset_and_embed_counts.py
import argparse
import os
import pandas as pd
from autometa.common import kmers

def main():
    parser = argparse.ArgumentParser()
    parser.add_argument("--input", help="file path to kmer counts table", required=True)
    parser.add_argument("--output", help="directory path to store sample size embeddings", required=True)
    args = parser.parse_args()
    # Read in table
    # i.e. counts.tsv
    df = pd.read_csv(args.input, sep='\t', index_col='contig')

    # subsample by num. contigs specified
    sample_sizes = [100, 200, 400, 800, 1000, 5000, 10000]
    for sample_size in sample_sizes:
        counts_subset = df.sample(n=sample_size)
        norm_df = kmers.normalize(counts_subset, method="am_clr")
        # Write embedded to sample size path
        sample_embed_filepath = os.path.join(args.output, f"kmers.sample_size_{sample_size}.embedded.tsv")
        embedded_df = kmers.embed(
            kmers=norm_df,
            out=sample_embed_filepath,
            pca_dimensions=50,
            method="bhsne",
            embed_dimensions=2
        )
        print(f"Wrote sample size embedding to {sample_embed_filepath}")

if __name__ == "__main__":
    main()

Compute counts, subset and write embeddings

# Set filepaths and parameters
fasta="metagenome.fna"
kmers="counts.tsv"
outdir="path to store embeddings"
size=5
cpus=2

## Compute counts
autometa-kmers --fasta $fasta --kmers $kmers --size $size --cpus $cpus

## Subset and write embeddings
python subset_and_embed_counts.py --input $kmers --output $outdir

evanroyrees avatar Jun 11 '21 16:06 evanroyrees

I realized that the tasks should probably be made a bit more granular

jason-c-kwan avatar Jun 16 '21 21:06 jason-c-kwan

@WiscEvan Could you check out the script of video 1 and let me know what you think?

jason-c-kwan avatar Jun 16 '21 21:06 jason-c-kwan

I've updated my comment so the checklist can be reviewed when arriving at the page and so that it is in one place

evanroyrees avatar Jun 17 '21 18:06 evanroyrees

The video 8 idea seems to be a bit redundant with video 2. In order to explain why we need to calculate the coverage, I will have to bring up how we use two BH-tSNE dimensions and one coverage dimension. I'm not sure what else would need to be said?

jason-c-kwan avatar Jun 23 '21 19:06 jason-c-kwan

Yeah, video 8 could probably be grouped in with video 9

evanroyrees avatar Jun 23 '21 20:06 evanroyrees