Funannotate update: Feature overlapped by 2 identical-length genes but has no cross-reference
Are you using the latest release? I'm using the latest version available in docker (v1.8.15) Describe the bug After I ran funannotate update, I got a message about several gene models that need fixing:
FUN_010869 Feature overlapped by 2 identical-length genes but has no cross-reference
FUN_010870 Feature overlapped by 2 identical-length genes but has no cross-reference
When I checked the tbl file, I indeed saw that genes FUN_010869 and FUN_010870 overlap in length, but have different CDSs:
780550 781621 gene
locus_tag FUN_010869
780550 781239 mRNA
781297 781621
product hypothetical protein
transcript_id gnl|ncbi|FUN_010869-T1_mrna
protein_id gnl|ncbi|FUN_010869-T1
780738 781239 CDS
781297 781484
codon_start 1
product hypothetical protein
transcript_id gnl|ncbi|FUN_010869-T1_mrna
protein_id gnl|ncbi|FUN_010869-T1
780550 781243 mRNA
781297 781621
product hypothetical protein
transcript_id gnl|ncbi|FUN_010869-T2_mrna
protein_id gnl|ncbi|FUN_010869-T2
780738 781243 CDS
781297 781363
codon_start 1
product hypothetical protein
transcript_id gnl|ncbi|FUN_010869-T2_mrna
protein_id gnl|ncbi|FUN_010869-T2
780550 781621 gene
locus_tag FUN_010870
780550 781621 mRNA
product hypothetical protein
transcript_id gnl|ncbi|FUN_010870-T1_mrna
protein_id gnl|ncbi|FUN_010870-T1
780738 781247 CDS
codon_start 1
product hypothetical protein
transcript_id gnl|ncbi|FUN_010870-T1_mrna
protein_id gnl|ncbi|FUN_010870-T1
How should I fix the file? Should I move mRNA and CDS features from FUN_010870 to FUN_010869, and remove FUN_010870? Would that work? Thank you!
What command did you issue? singularity run ../singularity/funannotate.sif funannotate update -i analysis_and_temp_files/06_annotate_lecanoro/Xp_jgi_pred/ --cpus 28
Logfiles
[Sep 15 09:30 PM]: Funannotate update is finished, output files are in the analysis_and_temp_files/06_annotate_lecanoro/Xp_jgi_pred//update_results folder
[Sep 15 09:30 PM]: There are 5 gene models that need to be fixed.
[Sep 15 09:30 PM]: Manually edit the tbl file analysis_and_temp_files/06_annotate_lecanoro/Xp_jgi_pred/update_results/Xanthoria_parietina_46-1-SA22.tbl, then run:
funannotate fix -i analysis_and_temp_files/06_annotate_lecanoro/Xp_jgi_pred/update_results/Xanthoria_parietina_46-1-SA22.gbk -t analysis_and_temp_files/06_annotate_lecanoro/Xp_jgi_p$
[Sep 15 09:30 PM]: After the problematic gene models are fixed, you can proceed with functional annotation.
[Sep 15 09:30 PM]: Your next step might be functional annotation, suggested commands:
-------------------------------------------------------
Run InterProScan (Docker required):
funannotate iprscan -i analysis_and_temp_files/06_annotate_lecanoro/Xp_jgi_pred/ -m docker -c 28
Run antiSMASH:
funannotate remote -i analysis_and_temp_files/06_annotate_lecanoro/Xp_jgi_pred/ -m antismash -e [email protected]
Annotate Genome:
funannotate annotate -i analysis_and_temp_files/06_annotate_lecanoro/Xp_jgi_pred/ --cpus 28 --sbt yourSBTfile.txt
-------------------------------------------------------
-------------------------------------------------------
-------------------------------------------------------
FUN_000415 Feature begins or ends in gap starting at 1124202
FUN_002683 Feature begins or ends in gap starting at 552774
FUN_010869 Feature overlapped by 2 identical-length genes but has no cross-reference
FUN_010870 Feature overlapped by 2 identical-length genes but has no cross-reference
-------------------------------------------------------
OS/Install Information
- output of
funannotate check --show-versions
-------------------------------------------------------
Checking dependencies for 1.8.15
-------------------------------------------------------
You are running Python v 3.8.12. Now checking python packages...
biopython: 1.81
goatools: 1.2.3
matplotlib: 3.7.1
natsort: 8.3.1
numpy: 1.22.4
pandas: 2.0.1
psutil: 5.9.5
requests: 2.28.2
scikit-learn: 0.24.2
scipy: 1.5.3
seaborn: 0.12.2
All 11 python packages installed
You are running Perl v b'5.026002'. Now checking perl modules...
Carp: 1.38
Clone: 0.42
DBD::SQLite: 1.64
DBD::mysql: 4.046
DBI: 1.642
DB_File: 1.855
Data::Dumper: 2.173
File::Basename: 2.85
File::Which: 1.23
Getopt::Long: 2.5
Hash::Merge: 0.300
JSON: 4.02
LWP::UserAgent: 6.39
Logger::Simple: 2.0
POSIX: 1.76
Parallel::ForkManager: 2.02
Pod::Usage: 1.69
Scalar::Util::Numeric: 0.40
Storable: 3.15
Text::Soundex: 3.05
Thread::Queue: 3.12
Tie::File: 1.02
URI::Escape: 3.31
YAML: 1.29
local::lib: 2.000024
threads: 2.15
threads::shared: 1.56
All 27 Perl modules installed
Checking Environmental Variables...
$FUNANNOTATE_DB=/opt/databases
$PASAHOME=/venv/opt/pasa-2.4.1
$TRINITYHOME=/venv/opt/trinity-2.8.5
$EVM_HOME=/venv/opt/evidencemodeler-1.1.1
$AUGUSTUS_CONFIG_PATH=/usr/share/augustus/config
ERROR: GENEMARK_PATH not set. export GENEMARK_PATH=/path/to/dir
-------------------------------------------------------
Checking external dependencies...
ERROR: pslDnaFiler found but error running: pslCDnaFilter: error while loading shared libraries: libssl.so.1.0.0: cannot open shared object file: No such file or directory
PASA: 2.4.1
CodingQuarry: 2.0
Trinity: 2.8.5
augustus: 3.3.2
bamtools: bamtools 2.5.2
bedtools: bedtools v2.30.0
blat: BLAT v35
diamond: 2.1.6
ete3: 3.1.2
exonerate: exonerate 2.4.0
fasta: 36.3.8g
glimmerhmm: 3.0.4
gmap: 2017-11-15
hisat2: 2.2.1
hmmscan: HMMER 3.3.2 (Nov 2020)
hmmsearch: HMMER 3.3.2 (Nov 2020)
java: 11.0.8-internal
kallisto: 0.46.1
mafft: v7.520 (2023/Mar/22)
makeblastdb: makeblastdb 2.2.31+
minimap2: 2.24-r1122
pigz: 2.6
proteinortho: 6.0.16
salmon: salmon 0.14.1
samtools: samtools 1.12
snap: 2006-07-28
stringtie: 2.2.1
tRNAscan-SE: 2.0.9 (July 2021)
tantan: tantan 40
tbl2asn: 25.8
tblastn: tblastn 2.2.31+
trimal: trimAl v1.4.rev15 build[2013-12-17]
trimmomatic: 0.39
ERROR: emapper.py not installed
ERROR: gmes_petap.pl not installed
ERROR: pslCDnaFilter not installed
ERROR: signalp not installed
how did you solve this issue? @nextgenusfs I ran into the same issue, and in addition I was getting quite some other errors, some look like:
FUN_029468 Feature overlapped by 2 identical-length genes but has no cross-reference
FUN_029469 Feature overlapped by 2 identical-length genes but has no cross-reference
FUN_029502 Feature begins or ends in gap starting at 1627056
FUN_029528 CDS not contained within cross-referenced mRNA
Then if I look into the tbl file, these two genes indeed have identical length and do not cross reference each other, how should this be fixed?:
686279 688092 gene
locus_tag FUN_029468
686279 686935 mRNA
687021 688092
product hypothetical protein
transcript_id gnl|ncbi|FUN_029468-T1_mrna
protein_id gnl|ncbi|FUN_029468-T1
686582 686935 CDS
687021 687803
codon_start 1
product hypothetical protein
transcript_id gnl|ncbi|FUN_029468-T1_mrna
protein_id gnl|ncbi|FUN_029468-T1
686279 686935 mRNA
687021 687376
687540 688092
product hypothetical protein
transcript_id gnl|ncbi|FUN_029468-T2_mrna
protein_id gnl|ncbi|FUN_029468-T2
686308 686935 CDS
687021 687376
687540 687803
codon_start 1
product hypothetical protein
transcript_id gnl|ncbi|FUN_029468-T2_mrna
protein_id gnl|ncbi|FUN_029468-T2
686279 687376 mRNA
687540 688092
product hypothetical protein
transcript_id gnl|ncbi|FUN_029468-T3_mrna
protein_id gnl|ncbi|FUN_029468-T3
686308 687376 CDS
687540 687688
codon_start 1
product hypothetical protein
transcript_id gnl|ncbi|FUN_029468-T3_mrna
protein_id gnl|ncbi|FUN_029468-T3
686279 688092 gene
locus_tag FUN_029469
686279 688092 mRNA
product hypothetical protein
transcript_id gnl|ncbi|FUN_029469-T1_mrna
protein_id gnl|ncbi|FUN_029469-T1
686308 687426 CDS
codon_start 1
product hypothetical protein
transcript_id gnl|ncbi|FUN_029469-T1_mrna
protein_id gnl|ncbi|FUN_029469-T1
for this gene it complains that start or ends in a gap starting at 1627056:
1627057 1625448 gene
locus_tag FUN_029502
1627057 1626623 mRNA
1626464 1626390
1626323 1626248
1626122 1625448
product hypothetical protein
transcript_id gnl|ncbi|FUN_029502-T1_mrna
protein_id gnl|ncbi|FUN_029502-T1
1626732 1626623 CDS
1626464 1626390
1626323 1626248
1626124 1625822
codon_start 1
product hypothetical protein
transcript_id gnl|ncbi|FUN_029502-T1_mrna
protein_id gnl|ncbi|FUN_029502-T1
And for this one it says "CDS not contained within cross-referenced mRNA", but I cannot really see what is wrong with it...
2522815 2523725 gene
locus_tag FUN_029528
2522815 2523725 mRNA
product hypothetical protein
transcript_id gnl|ncbi|FUN_029528-T1_mrna
protein_id gnl|ncbi|FUN_029528-T1
2522901 2523377 CDS
codon_start 1
product hypothetical protein
transcript_id gnl|ncbi|FUN_029528-T1_mrna
protein_id gnl|ncbi|FUN_029528-T1
Emiliania_huxleyi_CCMP1516.models-need-fixing.txt Emiliania_huxleyi_CCMP1516.zip I'd appreciate any help to be able to finish this annotation :)