remote AntiSMASH fails with duplicate CDS features
Are you using the latest release? Version 1.8.14
Describe the bug Running the tutorial for genome+RNAseq with our own data, we successfully get through most of the annotation stages.
However when we run the remote antiSMASH portion it runs for a while and then we get a timeout/no route to host error.
When I manually check the status of the antiSMASH job (fungi-73669e74-7826-4ae1-bc9a-223d2a47caed) on the antiSMASH website, it lists the following error:
Submitted: Mar 7, 2023 13:48:22 Status: failed: Job returned errors: ERROR 08/03 08:07:00 Multiple CDS features have the same location: 615475:616117 Last status change: Mar 8, 2023 00:07:00
What command did you issue?
previously:
funannotate train -i ../RepeatMasker/Poreg_genome_final.fasta.masked -o Poreg_fun
--left ../Trim_Galore/Poreg_RNA_1_val_1.fq.gz ../Trim_Galore/Poreg_RNA_2_val_1.fq.gz
--right ../Trim_Galore/Poreg_RNA_1_val_2.fq.gz ../Trim_Galore/Poreg_RNA_2_val_2.fq.gz
--stranded RF --jaccard_clip --species "Pseudozyma oregonense"
--strain UnNamed --cpus 12 --no_trimmomatic
funannotate predict -i ../RepeatMasker/Poreg_genome_final.fasta.masked -o Poreg_fun
--species "Pseudozyma oregonense" --strain UnNamed
--cpus 12 --protein_evidence ../Related_species/Phubeiensis_GCF_000403515.1_ASM40351v1_protein.faa $FUNANNOTATE_DB/uniprot_sprot.fasta
funannotate update -i Poreg_fun --cpus 12
command with error: funannotate remote -i Poreg_fun -m antismash -e [email protected]
Logfiles sge.funantismash.e134764.log.txt
OS/Install Information
I masked out some of our cluster specific paths.
Checking dependencies for 1.8.14
You are running Python v 3.8.15. Now checking python packages... biopython: 1.80 goatools: 1.2.3 matplotlib: 3.4.3 natsort: 8.2.0 numpy: 1.24.1 pandas: 1.5.2 psutil: 5.9.4 requests: 2.28.1 scikit-learn: 1.2.0 scipy: 1.10.0 seaborn: 0.12.2 All 11 python packages installed
You are running Perl v b'5.032001'. Now checking perl modules... Carp: 1.38 Clone: 0.46 DBD::SQLite: 1.72 DBD::mysql: 4.050 DBI: 1.643 DB_File: 1.855 Data::Dumper: 2.183 File::Basename: 2.85 File::Which: 1.24 Getopt::Long: 2.54 Hash::Merge: 0.302 JSON: 4.10 LWP::UserAgent: 6.67 Logger::Simple: 2.0 POSIX: 1.94 Parallel::ForkManager: 2.02 Pod::Usage: 1.69 Scalar::Util::Numeric: 0.40 Storable: 3.15 Text::Soundex: 3.05 Thread::Queue: 3.14 Tie::File: 1.06 URI::Escape: 5.12 YAML: 1.30 local::lib: 2.000029 threads: 2.25 threads::shared: 1.61 All 27 Perl modules installed
Checking Environmental Variables... $FUNANNOTATE_DB=/LOCATION/databases/funannotate/current $PASAHOME=/LOCATION/funannotate-1.8.14/opt/pasa-2.5.2 $TRINITY_HOME=/LOCATION/funannotate-1.8.14/opt/trinity-2.8.5 $EVM_HOME=/LOCATION/funannotate-1.8.14/opt/evidencemodeler-1.1.1 $AUGUSTUS_CONFIG_PATH=/LOCATION/Funannotate/augustus/config $GENEMARK_PATH=/LOCATION/gmes_linux_64 All 6 environmental variables are set
Checking external dependencies... PASA: 2.5.2 CodingQuarry: 2.0 Trinity: 2.8.5 augustus: 3.5.0 bamtools: bamtools 2.5.1 bedtools: bedtools v2.30.0 blat: BLAT v35 diamond: 2.0.15 emapper.py: 2.1.9 ete3: 3.1.2 exonerate: exonerate 2.4.0 fasta: 36.3.8g glimmerhmm: 3.0.4 gmap: 2023-02-17 gmes_petap.pl: 4.71_lic hisat2: 2.2.1 hmmscan: HMMER 3.3.2 (Nov 2020) hmmsearch: HMMER 3.3.2 (Nov 2020) java: 17.0.3-internal kallisto: 0.46.1 mafft: v7.508 (2022/Sep/07) makeblastdb: makeblastdb 2.2.31+ minimap2: 2.24-r1122 pigz: 2.7 proteinortho: 6.1.7 pslCDnaFilter: no way to determine salmon: salmon 0.14.1 samtools: samtools 1.16.1 signalp: 4.1 snap: 2006-07-28 stringtie: 2.2.1 tRNAscan-SE: 2.0.11 (Oct 2022) tantan: tantan 40 tbl2asn: 25.8 tblastn: tblastn 2.2.31+ trimal: trimAl v1.4.rev15 build[2013-12-17] trimmomatic: 0.39 All 37 external dependencies are installed
I got the same problem here.
Running funannotate remote for antismash, I get the following error:
ERROR 22/05 11:48:33 Multiple CDS features have the same location: [144149:144395](-)
I am afraid that because of some alternative splicing it identifies multiple mRNA features but from them, there is only one (same) CDS predicted in the same location.
gene complement(144150..144941)
/locus_tag="FUN_020060"
mRNA complement(join(144150..144784,144846..144941))
/locus_tag="FUN_020060"
/product="hypothetical protein"
mRNA complement(join(144150..144447,144510..144941))
/locus_tag="FUN_020060"
/product="hypothetical protein"
CDS complement(144150..144395)
/locus_tag="FUN_020060"
/codon_start=1
/product="hypothetical protein"
/protein_id="ncbi:FUN_020060-T1"
/translation="MRSTAYMHNSQCFSTFPSFHVRIPLSCPSPKDLSAFCDSCPCLV
SLGYSSISRLGCVITESGDLISSNNGRDMSSPILNQP"
CDS complement(144150..144395)
/locus_tag="FUN_020060"
/codon_start=1
/product="hypothetical protein"
/protein_id="ncbi:FUN_020060-T2"
/translation="MRSTAYMHNSQCFSTFPSFHVRIPLSCPSPKDLSAFCDSCPCLV
SLGYSSISRLGCVITESGDLISSNNGRDMSSPILNQP"
Would there be any way to overcome this?
If I remove all the identical CDS, would the resulting output be parseable in the annotation?
Thanks in advance for the help.