Animations in methods documentation
Autometa Methods Documentation Video Series
This is a series of videos with voiceovers visualizing describing Autometa methods with visual aides generated by manim animations.
📝:movie_camera: :speaker:🎨 Scripts / Scenes / Animations / Voiceovers 📝 :movie_camera::speaker:🎨
Video Overview
- Length Filtering
- Coverage Calculation
- ORF Calling
- Marker annotation
- Taxon Assignment
- K-mer counting
- K-mer embedding
- 3-Dimensions of Clustering Features
- Binning with Recursive DBSCAN
- Unclustered Recruitment
Video 1 - Length filtering
- [x] Script
- [ ] Animations
- [ ] Voiceover
- [ ] Editing
Video 2 - Coverage calculation
- [x] Script
- [ ] Animations
- [ ] Voiceover
- [ ] Editing
Video 3 - ORF calling
- [x] Script
- [ ] Animations
- [ ] Voiceover
- [ ] Editing
Video 4 - Marker annotation
- [ ] Script
- [ ] Animations
- [ ] Voiceover
- [ ] Editing
Video 5 - Taxon assignment
- [ ] Script
- [ ] Animations
- [ ] Voiceover
- [ ] Editing
Video 6 - K-mer counting
- [ ] Script
- [ ] Animations
- [ ] Voiceover
- [ ] Editing
Video 7 - K-mer embedding
- [ ] Script
- [ ] Animations
- [ ] Voiceover
- [ ] Editing
~~Video 8 - 3 dimensions of clustering features~~
- [ ] ~~Script~~
- [ ] ~~Animations~~
- [ ] ~~Voiceover~~
- [ ] ~~Editing~~
Video 9 - Binning with recursive DBSCAN
- [ ] Script
- [ ] Animations
- [ ] Voiceover
- [ ] Editing
Video 10 - Unclustered recruitment
- [ ] Script
- [ ] Animations
- [ ] Voiceover
- [ ] Editing
NOTE: Manim has been forked and is being maintained in two separate repositories.
From the ManimCommunity/manim repository:
This fork is updated more frequently than his, and it's recommended to use this fork if you'd like to use Manim for your own projects.
- Manim Community website 🔗
- Manim Community manim documentation :memo:
- Grant Sanderson's manim repository :octocat:
Example Scenes from Professor Jason Kwan's ASP presentation
- K-mer counting (5:49 - 6:42)
- K-mer embedding (7:02)
- 3 dimensions of clustering features (8:53 - 9:02)
- Binning with recursive DBSCAN (7:18 - 8:46)
I've started a new animations repo at https://github.com/jason-c-kwan/Autometa_animations. Can we make the above list into a checklist?
So far I've made a sort of logo animation that can go at the beginning of each video.
For the first video, I think I will show the BHtSNE graph for the same dataset as we change the length cutoff, while coloring the points based on the ground truth. @WiscEvan can you write instructions here on how I would use the Autometa entrypoints to basically do K-mer counting on all contigs, then do normalization and BHtSNE on different subsets? I am thinking I could programmatically chop up an internal Pandas table, but I just need a quick reminder of how to incorporate the BH-tSNE part into the script.
k-mer counting
Subset kmers then normalize, embed and write to sample size filepath
#!/usr/bin/env python
# Save to subset_and_embed_counts.py
import argparse
import os
import pandas as pd
from autometa.common import kmers
def main():
parser = argparse.ArgumentParser()
parser.add_argument("--input", help="file path to kmer counts table", required=True)
parser.add_argument("--output", help="directory path to store sample size embeddings", required=True)
args = parser.parse_args()
# Read in table
# i.e. counts.tsv
df = pd.read_csv(args.input, sep='\t', index_col='contig')
# subsample by num. contigs specified
sample_sizes = [100, 200, 400, 800, 1000, 5000, 10000]
for sample_size in sample_sizes:
counts_subset = df.sample(n=sample_size)
norm_df = kmers.normalize(counts_subset, method="am_clr")
# Write embedded to sample size path
sample_embed_filepath = os.path.join(args.output, f"kmers.sample_size_{sample_size}.embedded.tsv")
embedded_df = kmers.embed(
kmers=norm_df,
out=sample_embed_filepath,
pca_dimensions=50,
method="bhsne",
embed_dimensions=2
)
print(f"Wrote sample size embedding to {sample_embed_filepath}")
if __name__ == "__main__":
main()
Compute counts, subset and write embeddings
# Set filepaths and parameters
fasta="metagenome.fna"
kmers="counts.tsv"
outdir="path to store embeddings"
size=5
cpus=2
## Compute counts
autometa-kmers --fasta $fasta --kmers $kmers --size $size --cpus $cpus
## Subset and write embeddings
python subset_and_embed_counts.py --input $kmers --output $outdir
I realized that the tasks should probably be made a bit more granular
@WiscEvan Could you check out the script of video 1 and let me know what you think?
I've updated my comment so the checklist can be reviewed when arriving at the page and so that it is in one place
The video 8 idea seems to be a bit redundant with video 2. In order to explain why we need to calculate the coverage, I will have to bring up how we use two BH-tSNE dimensions and one coverage dimension. I'm not sure what else would need to be said?
Yeah, video 8 could probably be grouped in with video 9