funannotate test error
hi, I recently installed funannotate software, and an error was reported during testing. (funannotate) [liuyuanchao@login ~]$ funannotate check --show-versions
Checking dependencies for 1.8.7
You are running Python v 3.9.7. Now checking python packages... biopython: 1.79 goatools: 1.1.6 matplotlib: 3.4.3 natsort: 8.0.0 numpy: 1.21.4 pandas: 1.3.4 psutil: 5.8.0 requests: 2.26.0 scikit-learn: 1.0.1 scipy: 1.7.0 seaborn: 0.11.2 All 11 python packages installed
You are running Perl v b'5.026002'. Now checking perl modules... Bio::Perl: 1.007002 Carp: 1.38 Clone: 0.42 DBD::SQLite: 1.64 DBD::mysql: 4.046 DBI: 1.642 DB_File: 1.855 Data::Dumper: 2.173 File::Basename: 2.85 File::Which: 1.23 Getopt::Long: 2.5 Hash::Merge: 0.300 JSON: 4.02 LWP::UserAgent: 6.39 Logger::Simple: 2.0 POSIX: 1.76 Parallel::ForkManager: 2.02 Pod::Usage: 1.69 Scalar::Util::Numeric: 0.40 Storable: 3.15 Text::Soundex: 3.05 Thread::Queue: 3.12 Tie::File: 1.02 URI::Escape: 3.31 YAML: 1.29 threads: 2.15 threads::shared: 1.56 All 27 Perl modules installed
Checking Environmental Variables... $PASAHOME=/public/home/liuyuanchao/software/anaconda3/envs/funannotate/opt/pasa-2.4.1 $TRINITY_HOME=/public/home/liuyuanchao/software/anaconda3/envs/funannotate/opt/trinity-2.8.5 $EVM_HOME=/public/home/liuyuanchao/software/anaconda3/envs/funannotate/opt/evidencemodeler-1.1.1 $AUGUSTUS_CONFIG_PATH=/public/home/liuyuanchao/software/anaconda3/envs/funannotate/config/ ERROR: FUNANNOTATE_DB not set. export FUNANNOTATE_DB=/path/to/dir ERROR: GENEMARK_PATH not set. export GENEMARK_PATH=/path/to/dir
Checking external dependencies... PASA: 2.4.1 CodingQuarry: 2.0 Trinity: 2.8.5 augustus: 3.3.3 bamtools: bamtools 2.5.1 bedtools: bedtools v2.30.0 blat: BLAT v36 diamond: 2.0.8 emapper.py: 2.1.3 ete3: 3.1.2 exonerate: exonerate 2.4.0 fasta: no way to determine glimmerhmm: 3.0.4 gmap: 2021-08-25 gmes_petap.pl: 4.68_lic hisat2: 2.2.1 hmmscan: HMMER 3.3.2 (Nov 2020) hmmsearch: HMMER 3.3.2 (Nov 2020) java: 11.0.8-internal kallisto: 0.46.1 mafft: v7.490 (2021/Oct/30) makeblastdb: makeblastdb 2.2.31+ minimap2: 2.22-r1101 proteinortho: 6.0.31 pslCDnaFilter: no way to determine salmon: salmon 0.14.1 samtools: samtools 1.9 signalp: 5.0b snap: 2006-07-28 stringtie: 2.1.7 tRNAscan-SE: 2.0.9 (July 2021) tantan: tantan 26 tbl2asn: no way to determine, likely 25.X tblastn: tblastn 2.2.31+ trimal: trimAl v1.4.rev15 build[2013-12-17] trimmomatic: 0.39 All 36 external dependencies are installed
but when we run the commond funannotate test, Some error appeared
(funannotate) [liuyuanchao@login /public/home/liuyuanchao/ceshi]
$funannotate test -t all --cpus 20
#########################################################
Running funannotate clean unit testing: minimap2 mediated assembly duplications
Downloading: https://osf.io/8pjbe/download?version=1 Bytes: 252076
CMD: funannotate clean -i test.clean.fa -o test.exhaustive.fa --exhaustive
#########################################################
6 input contigs, 6 larger than 500 bp, N50 is 427,039 bp Checking duplication of 6 contigs
scaffold_73 appears duplicated: 100% identity over 100% of the contig. contig length: 15153 scaffold_91 appears duplicated: 100% identity over 100% of the contig. contig length: 8858 scaffold_27 appears duplicated: 100% identity over 100% of the contig. contig length: 427039
6 input contigs; 6 larger than 500 bp; 3 duplicated; 3 written to file
#########################################################
SUCCESS: funannotate clean test complete.
#########################################################
#########################################################
Running funannotate mask unit testing: RepeatModeler --> RepeatMasker
Downloading: https://osf.io/hbryz/download?version=1 Bytes: 375687
CMD: funannotate mask -i test.fa -o test.masked.fa --cpus 20
#########################################################
[Dec 07 08:49 AM]: OS: CentOS Linux 7, 20 cores, ~ 131 GB RAM. Python: 3.9.7 [Dec 07 08:49 AM]: Running funanotate v1.8.7 [Dec 07 08:49 AM]: Soft-masking simple repeats with tantan [Dec 07 08:49 AM]: Repeat soft-masking finished: Masked genome: /public/home/liuyuanchao/ceshi/test-mask_d0755630-7e24-4b9a-90e8-50d205a3cd9f/test.masked.fa num scaffolds: 2 assembly size: 1,216,048 bp masked repeats: 50,965 bp (4.19%)
#########################################################
SUCCESS: funannotate mask test complete.
#########################################################
#########################################################
Running funannotate predict unit testing
CMD: funannotate predict -i test.softmasked.fa --protein_evidence protein.evidence.fasta -o annotate --augustus_species saccharomyces --cpus 20 --species Awesome testicus]]]]]]]]]]]]#########################################################
[Dec 07 08:49 AM]: OS: CentOS Linux 7, 20 cores, ~ 131 GB RAM. Python: 3.9.7
[Dec 07 08:49 AM]: Running funannotate v1.8.7
[Dec 07 08:49 AM]: ERROR: dikarya busco database is not found, install with funannotate setup -b dikarya
#########################################################
Traceback (most recent call last):
File "/public/home/liuyuanchao/software/anaconda3/envs/funannotate/bin/funannotate", line 10, in
Did you try what the error indicates?
Dec 07 08:49 AM]: ERROR: dikarya busco database is not found, install with funannotate setup -b dikarya
Yes, it is indeed a problem with the database. I have specified the location of the database before downloading the database, but I don’t know where the problem occurred. Funannotate failed to identify it. Now it seems that I have to specify the location every time I start it.
You can either specify with command line or you can set the FUNANNOTATE_DB environmental variable.
On Dec 6, 2021, at 5:52 PM, liuyca1 @.***> wrote:
Yes, it is indeed a problem with the database. I have specified the location of the database before downloading the database, but I don’t know where the problem occurred. Funannotate failed to identify it. Now it seems that I have to specify the location every time I start it.
— You are receiving this because you commented. Reply to this email directly, view it on GitHub, or unsubscribe.
The report after running funannotate test completely shows that there are still some errors, These errors can be ignored?
(funannotate) [liuyuanchao@login /public/home/liuyuanchao/ceshi]
$funannotate test -t all --cpus 20
#########################################################
Running funannotate clean unit testing: minimap2 mediated assembly duplications
CMD: funannotate clean -i test.clean.fa -o test.exhaustive.fa --exhaustive
#########################################################
6 input contigs, 6 larger than 500 bp, N50 is 427,039 bp Checking duplication of 6 contigs
scaffold_73 appears duplicated: 100% identity over 100% of the contig. contig length: 15153 scaffold_27 appears duplicated: 100% identity over 100% of the contig. contig length: 427039 scaffold_91 appears duplicated: 100% identity over 100% of the contig. contig length: 8858
6 input contigs; 6 larger than 500 bp; 3 duplicated; 3 written to file
#########################################################
SUCCESS: funannotate clean test complete.
#########################################################
#########################################################
Running funannotate mask unit testing: RepeatModeler --> RepeatMasker
CMD: funannotate mask -i test.fa -o test.masked.fa --cpus 20
#########################################################
[Dec 07 09:24 AM]: OS: CentOS Linux 7, 20 cores, ~ 131 GB RAM. Python: 3.9.7 [Dec 07 09:24 AM]: Running funanotate v1.8.7 [Dec 07 09:24 AM]: Soft-masking simple repeats with tantan [Dec 07 09:24 AM]: Repeat soft-masking finished: Masked genome: /public/home/liuyuanchao/ceshi/test-mask_4903f361-aac1-46a8-bdb2-b407e72b501c/test.masked.fa num scaffolds: 2 assembly size: 1,216,048 bp masked repeats: 50,965 bp (4.19%)
#########################################################
SUCCESS: funannotate mask test complete.
#########################################################
#########################################################
Running funannotate predict unit testing
CMD: funannotate predict -i test.softmasked.fa --protein_evidence protein.evidence.fasta -o annotate --augustus_species saccharomyces --
#########################################################
[Dec 07 09:24 AM]: OS: CentOS Linux 7, 20 cores, ~ 131 GB RAM. Python: 3.9.7
[Dec 07 09:24 AM]: Running funannotate v1.8.7
[Dec 07 09:24 AM]: Skipping CodingQuarry as no --rna_bam passed
[Dec 07 09:24 AM]: Parsed training data, run ab-initio gene predictors as follows:
Program Training-Method
augustus pretrained
genemark selftraining
glimmerhmm busco
snap busco
[Dec 07 09:24 AM]: Loading genome assembly and parsing soft-masked repetitive sequences
[Dec 07 09:24 AM]: Genome loaded: 6 scaffolds; 3,776,588 bp; 19.75% repeats masked
[Dec 07 09:24 AM]: Mapping 1,065 proteins to genome using diamond and exonerate
[Dec 07 09:24 AM]: Found 1,784 preliminary alignments --> aligning with exonerate
[Dec 07 09:24 AM]: Exonerate finished: found 1,431 alignments
[Dec 07 09:24 AM]: Running GeneMark-ES on assembly
[Dec 07 09:27 AM]: 1,559 predictions from GeneMark
[Dec 07 09:27 AM]: Running BUSCO to find conserved gene models for training ab-initio predictors
[Dec 07 09:32 AM]: 370 BUSCO predictions validatednd, validating protein sequences
[Dec 07 09:32 AM]: Running Augustus gene prediction using saccharomyces parameters
[Dec 07 09:34 AM]: 1,489 predictions from Augustus
[Dec 07 09:34 AM]: Pulling out high quality Augustus predictions
[Dec 07 09:34 AM]: Found 370 high quality predictions from Augustus (>90% exon evidence)
[Dec 07 09:34 AM]: Running SNAP gene prediction, using training data: annotate/predict_misc/busco.final.gff3
[Dec 07 09:34 AM]: 2 predictions from SNAP
[Dec 07 09:34 AM]: Running GlimmerHMM gene prediction, using training data: annotate/predict_misc/busco.final.gff3
[Dec 07 09:37 AM]: 1,776 predictions from GlimmerHMM
[Dec 07 09:37 AM]: Summary of gene models passed to EVM (weights):
Source Weight Count
Augustus 1 1332
Augustus HiQ 2 371
GeneMark 1 1559
GlimmerHMM 1 1776
snap 1 2
Total - 5040
[Dec 07 09:37 AM]: EVM: partitioning input to ~ 35 genes per partition using min 1500 bp interval
[Dec 07 09:39 AM]: Converting to GFF3 and collecting all EVM results
[Dec 07 09:39 AM]: 1,689 total gene models from EVM
[Dec 07 09:39 AM]: Generating protein fasta files from 1,689 EVM models
[Dec 07 09:39 AM]: now filtering out bad gene models (< 50 aa in length, transposable elements, etc).
[Dec 07 09:39 AM]: Found 101 gene models to remove: 0 too short; 0 span gaps; 101 transposable elements
[Dec 07 09:39 AM]: 1,588 gene models remaining
[Dec 07 09:39 AM]: Predicting tRNAs
[Dec 07 09:39 AM]: 112 tRNAscan models are valid (non-overlapping)
[Dec 07 09:39 AM]: Generating GenBank tbl annotation file
[Dec 07 09:39 AM]: Converting to final Genbank format
[Dec 07 09:39 AM]: Collecting final annotation files for 1,700 total gene models
[Dec 07 09:39 AM]: Funannotate predict is finished, output files are in the annotate/predict_results folder
[Dec 07 09:39 AM]: Your next step might be functional annotation, suggested commands:
Run InterProScan (Docker required): funannotate iprscan -i annotate -m docker -c 20
Run antiSMASH: funannotate remote -i annotate -m antismash -e [email protected]
Annotate Genome: funannotate annotate -i annotate --cpus 20 --sbt yourSBTfile.txt
[Dec 07 09:39 AM]: Training parameters file saved: annotate/predict_results/saccharomyces.parameters.json [Dec 07 09:39 AM]: Add species parameters to database:
funannotate species -s saccharomyces -a annotate/predict_results/saccharomyces.parameters.json
#########################################################
SUCCESS: funannotate predict test complete.
#########################################################
#########################################################
Running funannotate predict BUSCO-mediated training unit testing
CMD: funannotate predict -i test.softmasked.fa --protein_evidence protein.evidence.fasta -o annotate --cpus 20 --species Awesome busco
#########################################################
[Dec 07 09:39 AM]: OS: CentOS Linux 7, 20 cores, ~ 131 GB RAM. Python: 3.9.7
[Dec 07 09:39 AM]: Running funannotate v1.8.7
[Dec 07 09:39 AM]: Skipping CodingQuarry as no --rna_bam passed
[Dec 07 09:39 AM]: Parsed training data, run ab-initio gene predictors as follows:
Program Training-Method
augustus busco
genemark selftraining
glimmerhmm busco
snap busco
[Dec 07 09:39 AM]: Loading genome assembly and parsing soft-masked repetitive sequences
[Dec 07 09:40 AM]: Genome loaded: 6 scaffolds; 3,776,588 bp; 19.75% repeats masked
[Dec 07 09:40 AM]: Mapping 1,065 proteins to genome using diamond and exonerate
[Dec 07 09:40 AM]: Found 1,784 preliminary alignments --> aligning with exonerate
[Dec 07 09:40 AM]: Exonerate finished: found 1,437 alignments
[Dec 07 09:40 AM]: Running GeneMark-ES on assembly
[Dec 07 09:42 AM]: 1,562 predictions from GeneMark
[Dec 07 09:42 AM]: Running BUSCO to find conserved gene models for training ab-initio predictors
[Dec 07 09:46 AM]: 373 valid BUSCO predictions found, validating protein sequences
[Dec 07 09:48 AM]: 370 BUSCO predictions validated
[Dec 07 09:48 AM]: Training Augustus using BUSCO gene models
[Dec 07 09:48 AM]: Augustus initial training results:
Feature Specificity Sensitivity
nucleotides 99.5% 83.8%
exons 71.8% 59.7%
genes 86.5% 59.3%
[Dec 07 09:48 AM]: Running Augustus gene prediction using awesome_busco parameters
[Dec 07 09:48 AM]: 1,303 predictions from Augustus
[Dec 07 09:48 AM]: Pulling out high quality Augustus predictions
[Dec 07 09:48 AM]: Found 314 high quality predictions from Augustus (>90% exon evidence)
[Dec 07 09:48 AM]: Running SNAP gene prediction, using training data: annotate/predict_misc/busco.final.gff3
[Dec 07 09:49 AM]: 2 predictions from SNAP
[Dec 07 09:49 AM]: Running GlimmerHMM gene prediction, using training data: annotate/predict_misc/busco.final.gff3
[Dec 07 09:52 AM]: 1,768 predictions from GlimmerHMM
[Dec 07 09:52 AM]: Summary of gene models passed to EVM (weights):
Source Weight Count
Augustus 1 989
Augustus HiQ 2 314
GeneMark 1 1562
GlimmerHMM 1 1768
snap 1 2
Total - 4635
[Dec 07 09:52 AM]: EVM: partitioning input to ~ 35 genes per partition using min 1500 bp interval
[Dec 07 09:53 AM]: Converting to GFF3 and collecting all EVM results
[Dec 07 09:53 AM]: 1,662 total gene models from EVM
[Dec 07 09:53 AM]: Generating protein fasta files from 1,662 EVM models
[Dec 07 09:53 AM]: now filtering out bad gene models (< 50 aa in length, transposable elements, etc).
[Dec 07 09:53 AM]: Found 82 gene models to remove: 0 too short; 0 span gaps; 82 transposable elements
[Dec 07 09:53 AM]: 1,580 gene models remaining
[Dec 07 09:53 AM]: Predicting tRNAs
[Dec 07 09:54 AM]: 112 tRNAscan models are valid (non-overlapping)
[Dec 07 09:54 AM]: Generating GenBank tbl annotation file
[Dec 07 09:54 AM]: Converting to final Genbank format
[Dec 07 09:54 AM]: Collecting final annotation files for 1,692 total gene models
[Dec 07 09:54 AM]: Funannotate predict is finished, output files are in the annotate/predict_results folder
[Dec 07 09:54 AM]: Your next step might be functional annotation, suggested commands:
Run InterProScan (Docker required): funannotate iprscan -i annotate -m docker -c 20
Run antiSMASH: funannotate remote -i annotate -m antismash -e [email protected]
Annotate Genome: funannotate annotate -i annotate --cpus 20 --sbt yourSBTfile.txt
[Dec 07 09:54 AM]: Training parameters file saved: annotate/predict_results/awesome_busco.parameters.json [Dec 07 09:54 AM]: Add species parameters to database:
funannotate species -s awesome_busco -a annotate/predict_results/awesome_busco.parameters.json
#########################################################
SUCCESS: funannotate predict BUSCO-mediated training test complete.
#########################################################
Now running predict using all pre-trained ab-initio predictors
CMD: funannotate predict -i test.softmasked.fa --protein_evidence protein.evidence.fasta -o annotate2 --cpus 20 --species Awesome busco -p annotate/predict_results/awesome_busco.parameters.json
#########################################################
[Dec 07 09:54 AM]: OS: CentOS Linux 7, 20 cores, ~ 131 GB RAM. Python: 3.9.7
[Dec 07 09:54 AM]: Running funannotate v1.8.7
[Dec 07 09:54 AM]: Ab initio training parameters file passed: annotate/predict_results/awesome_busco.parameters.json
[Dec 07 09:54 AM]: Skipping CodingQuarry as no --rna_bam passed
[Dec 07 09:54 AM]: Parsed training data, run ab-initio gene predictors as follows:
Program Training-Method
augustus pretrained
genemark pretrained
glimmerhmm pretrained
snap pretrained
[Dec 07 09:54 AM]: Loading genome assembly and parsing soft-masked repetitive sequences
[Dec 07 09:54 AM]: Genome loaded: 6 scaffolds; 3,776,588 bp; 19.75% repeats masked
[Dec 07 09:54 AM]: Mapping 1,065 proteins to genome using diamond and exonerate
[Dec 07 09:54 AM]: Found 1,784 preliminary alignments --> aligning with exonerate
[Dec 07 09:54 AM]: Exonerate finished: found 1,437 alignments
[Dec 07 09:54 AM]: Running GeneMark-ES on assembly
[Dec 07 09:57 AM]: 1,565 predictions from GeneMark
[Dec 07 09:57 AM]: Running Augustus gene prediction using awesome_busco parameters
[Dec 07 09:57 AM]: 1,303 predictions from Augustus
[Dec 07 09:57 AM]: Pulling out high quality Augustus predictions
[Dec 07 09:57 AM]: Found 314 high quality predictions from Augustus (>90% exon evidence)
[Dec 07 09:57 AM]: Running SNAP gene prediction, using pre-trained HMM profile
[Dec 07 09:58 AM]: 2 predictions from SNAP
[Dec 07 09:58 AM]: Running GlimmerHMM gene prediction, using pretrained HMM profile
[Dec 07 09:58 AM]: 1,768 predictions from GlimmerHMM
[Dec 07 09:58 AM]: Summary of gene models passed to EVM (weights):
Source Weight Count
Augustus 1 989
Augustus HiQ 2 314
GeneMark 1 1565
GlimmerHMM 1 1768
snap 1 2
Total - 4638
[Dec 07 09:58 AM]: EVM: partitioning input to ~ 35 genes per partition using min 1500 bp interval
[Dec 07 10:00 AM]: Converting to GFF3 and collecting all EVM results
[Dec 07 10:00 AM]: 1,661 total gene models from EVM
[Dec 07 10:00 AM]: Generating protein fasta files from 1,661 EVM models
[Dec 07 10:00 AM]: now filtering out bad gene models (< 50 aa in length, transposable elements, etc).
[Dec 07 10:00 AM]: Found 82 gene models to remove: 0 too short; 0 span gaps; 82 transposable elements
[Dec 07 10:00 AM]: 1,579 gene models remaining
[Dec 07 10:00 AM]: Predicting tRNAs
[Dec 07 10:00 AM]: 112 tRNAscan models are valid (non-overlapping)
CMD: funannotate train -i test.softmasked.fa --single rna-seq.illumina.fastq.gz --nanopore_mrna rna-seq.nanopore.fastq.gz -o rna-seq --cpus 20 --jaccard_clip --species Awesome rna]
#########################################################
[Dec 07 10:10 AM]: OS: CentOS Linux 7, 20 cores, ~ 131 GB RAM. Python: 3.9.7]
[Dec 07 10:10 AM]: Running 1.8.7]
[Dec 07 10:10 AM]: Adapter and Quality trimming SE reads with Trimmomatic
Traceback (most recent call last):
File "/public/home/liuyuanchao/software/anaconda3/envs/funannotate/bin/funannotate", line 10, in
sys.exit(main())
File "/public/home/liuyuanchao/software/anaconda3/envs/funannotate/lib/python3.9/site-packages/funannotate/funannotate.py", line 705, in main
mod.main(arguments)
File "/public/home/liuyuanchao/software/anaconda3/envs/funannotate/lib/python3.9/site-packages/funannotate/train.py", line 958, in main
trim_single = runTrimmomaticSE(s_reads, cpus=args.cpus)
File "/public/home/liuyuanchao/software/anaconda3/envs/funannotate/lib/python3.9/site-packages/funannotate/train.py", line 58, in runTrimmomaticSE
lib.Fzip_inplace(output, cpus)
File "/public/home/liuyuanchao/software/anaconda3/envs/funannotate/lib/python3.9/site-packages/funannotate/library.py", line 404, in Fzip_inplace
runSubprocess(cmd, '.', log)
File "/public/home/liuyuanchao/software/anaconda3/envs/funannotate/lib/python3.9/site-packages/funannotate/library.py", line 661, in runSubprocess
proc = subprocess.Popen(cmd, cwd=dir, stdout=subprocess.PIPE,
File "/public/home/liuyuanchao/software/anaconda3/envs/funannotate/lib/python3.9/subprocess.py", line 951, in init
self._execute_child(args, executable, preexec_fn, close_fds,
File "/public/home/liuyuanchao/software/anaconda3/envs/funannotate/lib/python3.9/subprocess.py", line 1821, in _execute_child
raise child_exception_type(errno_num, err_msg, err_filename)
PermissionError: [Errno 13] Permission denied: 'pigz'
#########################################################
Now running funannotate predict using RNA-seq training data
CMD: funannotate predict -i test.softmasked.fa --protein_evidence protein.evidence.fasta -o rna-seq --cpus 20 --min_training_models 150 --species Awesome rna
#########################################################
runSubprocess(cmd, '.', log) File "/public/home/liuyuanchao/software/anaconda3/envs/funannotate/lib/python3.9/site-packages/funannotate/library.py", line 661, in runSubprocess proc = subprocess.Popen(cmd, cwd=dir, stdout=subprocess.PIPE, File "/public/home/liuyuanchao/software/anaconda3/envs/funannotate/lib/python3.9/subprocess.py", line 951, in init
self._execute_child(args, executable, preexec_fn, close_fds,
File "/public/home/liuyuanchao/software/anaconda3/envs/funannotate/lib/python3.9/subprocess.py", line 1821, in _execute_child
raise child_exception_type(errno_num, err_msg, err_filename) PermissionError: [Errno 13] Permission denied: 'pigz' ######################################################### Now running
funannotate predict using RNA-seq training data
CMD: funannotate predict -i test.softmasked.fa --protein_evidence protein.evidence.fasta -o rna-seq --cpus 20 --min_training_models 150 --species Awesome rna
#########################################################[Dec 07 10:11 AM]: OS: CentOS Linux 7, 20 cores, ~ 131 GB RAM. Python: 3.9.7
[Dec 07 10:11 AM]: Running funannotate v1.8.7
[Dec 07 10:11 AM]: Skipping CodingQuarry as no --rna_bam passed
[Dec 07 10:11 AM]: Parsed training data, run ab-initio gene predictors as follows:
Program Training-Method
augustus busco
genemark selftraining
glimmerhmm busco
snap busco
[Dec 07 10:11 AM]: Loading genome assembly and parsing soft-masked repetitive sequences
[Dec 07 10:11 AM]: Genome loaded: 6 scaffolds; 3,776,588 bp; 19.75% repeats masked
[Dec 07 10:11 AM]: Mapping 1,065 proteins to genome using diamond and exonerate
[Dec 07 10:11 AM]: Found 1,784 preliminary alignments --> aligning with exonerate
[Dec 07 10:11 AM]: Exonerate finished: found 1,437 alignments
[Dec 07 10:11 AM]: Running GeneMark-ES on assembly
[Dec 07 10:13 AM]: 1,562 predictions from GeneMark
[Dec 07 10:13 AM]: Running BUSCO to find conserved gene models for training ab-initio predictors
[Dec 07 10:17 AM]: 373 valid BUSCO predictions found, validating protein sequences
[Dec 07 10:19 AM]: 370 BUSCO predictions validated
[Dec 07 10:19 AM]: Training Augustus using BUSCO gene models
[Dec 07 10:19 AM]: Augustus initial training results:
Feature Specificity Sensitivity
nucleotides 99.5% 83.8%
exons 71.8% 59.7%
genes 86.5% 59.3%
[Dec 07 10:19 AM]: Running Augustus gene prediction using awesome_rna parameters
[Dec 07 10:20 AM]: 1,303 predictions from Augustus
[Dec 07 10:20 AM]: Pulling out high quality Augustus predictions
[Dec 07 10:20 AM]: Found 314 high quality predictions from Augustus (>90% exon evidence)
[Dec 07 10:20 AM]: Running SNAP gene prediction, using training data: rna-seq/predict_misc/busco.final.gff3
[Dec 07 10:20 AM]: 0 predictions from SNAP
[Dec 07 10:20 AM]: SNAP prediction failed, moving on without result
[Dec 07 10:20 AM]: Running GlimmerHMM gene prediction, using training data: rna-seq/predict_misc/busco.final.gff3
[Dec 07 10:23 AM]: 1,775 predictions from GlimmerHMM
[Dec 07 10:23 AM]: Summary of gene models passed to EVM (weights):
Source Weight Count
Augustus 1 989
Augustus HiQ 2 314
GeneMark 1 1562
GlimmerHMM 1 1775
Total - 4640
[Dec 07 10:23 AM]: EVM: partitioning input to ~ 35 genes per partition using min 1500 bp interval
[Dec 07 10:25 AM]: Converting to GFF3 and collecting all EVM results
[Dec 07 10:25 AM]: 1,682 total gene models from EVM
[Dec 07 10:25 AM]: Generating protein fasta files from 1,682 EVM models
[Dec 07 10:25 AM]: now filtering out bad gene models (< 50 aa in length, transposable elements, etc).
[Dec 07 10:25 AM]: Found 96 gene models to remove: 0 too short; 0 span gaps; 96 transposable elements
[Dec 07 10:25 AM]: 1,586 gene models remaining
[Dec 07 10:25 AM]: Predicting tRNAs
[Dec 07 10:25 AM]: 112 tRNAscan models are valid (non-overlapping)
[Dec 07 10:25 AM]: Generating GenBank tbl annotation file
[Dec 07 10:25 AM]: Converting to final Genbank format
[Dec 07 10:26 AM]: Collecting final annotation files for 1,698 total gene models
[Dec 07 10:26 AM]: Funannotate predict is finished, output files are in the rna-seq/predict_results folder
[Dec 07 10:26 AM]: Your next step might be functional annotation, suggested commands:
Run InterProScan (Docker required): funannotate iprscan -i rna-seq -m docker -c 20
Run antiSMASH: funannotate remote -i rna-seq -m antismash -e [email protected]
Annotate Genome: funannotate annotate -i rna-seq --cpus 20 --sbt yourSBTfile.txt
[Dec 07 10:26 AM]: Training parameters file saved: rna-seq/predict_results/awesome_rna.parameters.json [Dec 07 10:26 AM]: Add species parameters to database:
funannotate species -s awesome_rna -a rna-seq/predict_results/awesome_rna.parameters.json
#########################################################
Now running funannotate update to run PASA-mediated UTR addition and multiple transcripts
CMD: funannotate update -i rna-seq --cpus 20
#########################################################
[Dec 07 10:26 AM]: OS: CentOS Linux 7, 20 cores, ~ 131 GB RAM. Python: 3.9.7
[Dec 07 10:26 AM]: Running 1.8.7
[Dec 07 10:26 AM]: No NCBI SBT file given, will use default, for NCBI submissions pass one here '--sbt'
[Dec 07 10:26 AM]: Found relevant files in rna-seq/training, will re-use them:
Single reads: rna-seq/training/single.fq.gz
[Dec 07 10:26 AM]: Reannotating Awesome rna, NCBI accession: None
[Dec 07 10:26 AM]: Previous annotation consists of: 1,586 protein coding gene models and 112 non-coding gene models
[Dec 07 10:26 AM]: Adapter and Quality trimming SE reads with Trimmomatic
Traceback (most recent call last):
File "/public/home/liuyuanchao/software/anaconda3/envs/funannotate/bin/funannotate", line 10, in