The Bash tutorial is missing a step
User checklist
- [x] Are you using the latest release? Yes
- [x] Are you using python 3? Yes
- [x] Did you check previous issues to see if this has already been mentioned? Yes
- [x] Are you using a Mac or Linux machine? Linux machine
Description
Hello there, I am trying to learn how to use autometa from your tutorial posted on ReadTheDocs, but there is a piece missing in Step 4 - Single Copy Markers. There is not a step detailing how we create a hmmscan.tsv file. Can you provide this information to me please?
Expected Behavior
I checked the rest of the document, but there is no other mention of how the learners are supposed to make the hmmscan.tsv file.
System Environment
- Operating System: Linux
- RAM: A ton. Our university's cluster has over 45k nodes
- Disk: N/A
Tasks/Command(s)
- [x] Task 1
- [x] Task 2
- [x] Task 3
- [x] etc.
Log/Error information generated by Autometa.
Hello,
I appreciate you looking at my inquiry. I noticed that there was a step missing in your ReadTheDocs page for the tutorial. There is not step given to show us how to create hmmscan.tsv files before we need them to complete Step 4 - Single Copy Markers.
For example, I followed the tutorial exactly, but I keep getting an error telling me that the hmmscan.tsv file does not exist. I will past the directions for Step 4 here:
Create a markers directory to hold the marker genes
mkdir -p $HOME/Autometa/autometa/databases/markers
Change the default download path to the directory created above
autometa-config
--section databases
--option markers
--value $HOME/Autometa/autometa/databases/markers
Download single-copy marker genes
autometa-update-databases --update-markers
hmmpress the marker genes
hmmpress -f $HOME/Autometa/autometa/databases/markers/bacteria.single_copy.hmm hmmpress -f $HOME/Autometa/autometa/databases/markers/archaea.single_copy.hmm
autometa-markers
--orfs $HOME/tutorial/78mbp_metagenome.orfs.faa
--kingdom bacteria
--hmmscan $HOME/tutorial/78mbp_metagenome.hmmscan.tsv
--out $HOME/tutorial/78mbp_metagenome.markers.tsv
--parallel
--cpus 4
--seed 42
When I follow this code, I get this error:
ERROR:
[10/23/2024 04:39:10 PM DEBUG] autometa.common.external.hmmscan: hmmscan --seed 42 --cpu 0 --tblout /vast/agnanad1/Leone/autometa_tutorial/78mbp_metagenome.hmmscan.tsv /vast/agnanad1/Leone/autometa_tutorial/markers/bacteria.single_copy.hmm /vast/agnanad1/Leone/autometa_tutorial/78mbp_metagenome.orfs.faa
[10/23/2024 04:39:10 PM WARNING] autometa.common.external.hmmscan: Make sure your hmm profiles are pressed! hmmpress -f /vast/agnanad1/Leone/autometa_tutorial/markers/bacteria.single_copy.hmm
Traceback (most recent call last):
File "/home/lyisrae1/.conda/envs/autometa/bin/autometa-markers", line 10, in
Additionally, I've have had a few syntax issues in Step 5 - Taxonomy. But those were very easy to fix, so that is not the issue. But can I please get some clarification to finish out Step 4 on ReadTheDocs please? I cannot finish the tutorial properly without that step. autometa_tutorial.txt
Here is the process I did without the markers:
autometa-binning
--kmers /vast/agnanad1/Leone/autometa_tutorial/78mbp_metagenome.bacteria.kmers.embedded.tsv
--coverages /vast/agnanad1/Leone/autometa_tutorial/78mbp_metagenome.coverages.tsv
--gc-content /vast/agnanad1/Leone/autometa_tutorial/78mbp_metagenome.gc_content.tsv
--output-binning /vast/agnanad1/Leone/autometa_tutorial/78mbp_metagenome.binning.tsv
--output-main /vast/agnanad1/Leone/autometa_tutorial/78mbp_metagenome.main.tsv
--clustering-method dbscan
--completeness 20
--purity 90
--cov-stddev-limit 25
--gc-stddev-limit 5
--taxonomy /vast/agnanad1/Leone/autometa_tutorial/78mbp_metagenome.taxonomy.tsv
--starting-rank superkingdom
--rank-filter superkingdom
--rank-name-filter bacteria
And here is the error message: usage: autometa-binning [-h] --kmers filepath --coverages filepath --gc-content filepath --markers filepath --output-binning filepath [--output-main filepath] [--clustering-method {dbscan,hdbscan}] [--completeness 0 < float <= 100] [--purity 0 < float <= 100] [--cov-stddev-limit float] [--gc-stddev-limit float] [--taxonomy filepath] [--starting-rank {superkingdom,phylum,class,order,family,genus,species}] [--reverse-ranks] [--rank-filter {superkingdom,phylum,class,order,family,genus,species}] [--rank-name-filter RANK_NAME_FILTER] [--verbose] [--cpus int] autometa-binning: error: the following arguments are required: --markers
https://autometa.readthedocs.io/en/latest/bash-step-by-step-tutorial.html#single-copy-markers
Thank you for your time, Leone
It looks the documentation needs to be fixed but is mostly an issue with file paths
1
At the start it says to download metagenome.fna.gz to $HOME/tutorial/test_data/
but later the file has a different name $HOME/tutorial/test_data/78mbp_metagenome.fna
So to start you should download the metagenome.fna.gz and save it to/as $HOME/tutorial/test_data/78mbp_metagenome.fna
2
There is a separate issue in the ORF creation step.
Current:
autometa-orfs \
--assembly $HOME/tutorial/78mbp_metagenome.filtered.fna \
--output-nucls $HOME/tutorial/78mbp_metagenome.orfs.fna \
--output-prots $HOME/tutorial/a78mbp_metagenome.orfs.faa \
--cpus 40
Should be:
autometa-orfs \
--assembly $HOME/tutorial/78mbp_metagenome.filtered.fna \
--output-nucls $HOME/tutorial/78mbp_metagenome.orfs.fna \
--output-prots $HOME/tutorial/78mbp_metagenome.orfs.faa \
--cpus 40
That should fix the error.
CC- @shaneroesemann @jason-c-kwan , the documentation needs to be updated accordingly. Also the error message generated by autometa-markers is not helpful, the subprocess stderr should be captured and printed rather than just saying there's an error with hmmpress and "Make sure your hmm profiles are pressed! " which wasn't the issue