funannotate icon indicating copy to clipboard operation
funannotate copied to clipboard

Funannotate update: Feature overlapped by 2 identical-length genes but has no cross-reference

Open metalichen opened this issue 2 years ago • 1 comments

Are you using the latest release? I'm using the latest version available in docker (v1.8.15) Describe the bug After I ran funannotate update, I got a message about several gene models that need fixing:

FUN_010869      Feature overlapped by 2 identical-length genes but has no cross-reference
FUN_010870      Feature overlapped by 2 identical-length genes but has no cross-reference

When I checked the tbl file, I indeed saw that genes FUN_010869 and FUN_010870 overlap in length, but have different CDSs:

780550	781621	gene
			locus_tag	FUN_010869
780550	781239	mRNA
781297	781621
			product	hypothetical protein
			transcript_id	gnl|ncbi|FUN_010869-T1_mrna
			protein_id	gnl|ncbi|FUN_010869-T1
780738	781239	CDS
781297	781484
			codon_start	1
			product	hypothetical protein
			transcript_id	gnl|ncbi|FUN_010869-T1_mrna
			protein_id	gnl|ncbi|FUN_010869-T1
780550	781243	mRNA
781297	781621
			product	hypothetical protein
			transcript_id	gnl|ncbi|FUN_010869-T2_mrna
			protein_id	gnl|ncbi|FUN_010869-T2
780738	781243	CDS
781297	781363
			codon_start	1
			product	hypothetical protein
			transcript_id	gnl|ncbi|FUN_010869-T2_mrna
			protein_id	gnl|ncbi|FUN_010869-T2
780550	781621	gene
			locus_tag	FUN_010870
780550	781621	mRNA
			product	hypothetical protein
			transcript_id	gnl|ncbi|FUN_010870-T1_mrna
			protein_id	gnl|ncbi|FUN_010870-T1
780738	781247	CDS
			codon_start	1
			product	hypothetical protein
			transcript_id	gnl|ncbi|FUN_010870-T1_mrna
			protein_id	gnl|ncbi|FUN_010870-T1

How should I fix the file? Should I move mRNA and CDS features from FUN_010870 to FUN_010869, and remove FUN_010870? Would that work? Thank you!

What command did you issue? singularity run ../singularity/funannotate.sif funannotate update -i analysis_and_temp_files/06_annotate_lecanoro/Xp_jgi_pred/ --cpus 28

Logfiles

[Sep 15 09:30 PM]: Funannotate update is finished, output files are in the analysis_and_temp_files/06_annotate_lecanoro/Xp_jgi_pred//update_results folder
[Sep 15 09:30 PM]: There are 5 gene models that need to be fixed.
[Sep 15 09:30 PM]: Manually edit the tbl file analysis_and_temp_files/06_annotate_lecanoro/Xp_jgi_pred/update_results/Xanthoria_parietina_46-1-SA22.tbl, then run:

funannotate fix -i analysis_and_temp_files/06_annotate_lecanoro/Xp_jgi_pred/update_results/Xanthoria_parietina_46-1-SA22.gbk -t analysis_and_temp_files/06_annotate_lecanoro/Xp_jgi_p$

[Sep 15 09:30 PM]: After the problematic gene models are fixed, you can proceed with functional annotation.
[Sep 15 09:30 PM]: Your next step might be functional annotation, suggested commands:
-------------------------------------------------------
Run InterProScan (Docker required):
funannotate iprscan -i analysis_and_temp_files/06_annotate_lecanoro/Xp_jgi_pred/ -m docker -c 28

Run antiSMASH:
funannotate remote -i analysis_and_temp_files/06_annotate_lecanoro/Xp_jgi_pred/ -m antismash -e [email protected]

Annotate Genome:
funannotate annotate -i analysis_and_temp_files/06_annotate_lecanoro/Xp_jgi_pred/ --cpus 28 --sbt yourSBTfile.txt
-------------------------------------------------------

-------------------------------------------------------
-------------------------------------------------------
FUN_000415      Feature begins or ends in gap starting at 1124202
FUN_002683      Feature begins or ends in gap starting at 552774
FUN_010869      Feature overlapped by 2 identical-length genes but has no cross-reference
FUN_010870      Feature overlapped by 2 identical-length genes but has no cross-reference
-------------------------------------------------------

OS/Install Information

  • output of funannotate check --show-versions
-------------------------------------------------------
Checking dependencies for 1.8.15
-------------------------------------------------------
You are running Python v 3.8.12. Now checking python packages...
biopython: 1.81
goatools: 1.2.3
matplotlib: 3.7.1
natsort: 8.3.1
numpy: 1.22.4
pandas: 2.0.1
psutil: 5.9.5
requests: 2.28.2
scikit-learn: 0.24.2
scipy: 1.5.3
seaborn: 0.12.2
All 11 python packages installed


You are running Perl v b'5.026002'. Now checking perl modules...
Carp: 1.38
Clone: 0.42
DBD::SQLite: 1.64
DBD::mysql: 4.046
DBI: 1.642
DB_File: 1.855
Data::Dumper: 2.173
File::Basename: 2.85
File::Which: 1.23
Getopt::Long: 2.5
Hash::Merge: 0.300
JSON: 4.02
LWP::UserAgent: 6.39
Logger::Simple: 2.0
POSIX: 1.76
Parallel::ForkManager: 2.02
Pod::Usage: 1.69
Scalar::Util::Numeric: 0.40
Storable: 3.15
Text::Soundex: 3.05
Thread::Queue: 3.12
Tie::File: 1.02
URI::Escape: 3.31
YAML: 1.29
local::lib: 2.000024
threads: 2.15
threads::shared: 1.56
All 27 Perl modules installed


Checking Environmental Variables...
$FUNANNOTATE_DB=/opt/databases
$PASAHOME=/venv/opt/pasa-2.4.1
$TRINITYHOME=/venv/opt/trinity-2.8.5
$EVM_HOME=/venv/opt/evidencemodeler-1.1.1   
$AUGUSTUS_CONFIG_PATH=/usr/share/augustus/config
        ERROR: GENEMARK_PATH not set. export GENEMARK_PATH=/path/to/dir
-------------------------------------------------------
Checking external dependencies...
        ERROR: pslDnaFiler found but error running: pslCDnaFilter: error while loading shared libraries: libssl.so.1.0.0: cannot open shared object file: No such file or directory

PASA: 2.4.1
CodingQuarry: 2.0
Trinity: 2.8.5
augustus: 3.3.2
bamtools: bamtools 2.5.2
bedtools: bedtools v2.30.0
blat: BLAT v35
diamond: 2.1.6
ete3: 3.1.2
exonerate: exonerate 2.4.0
fasta: 36.3.8g
glimmerhmm: 3.0.4
gmap: 2017-11-15
hisat2: 2.2.1
hmmscan: HMMER 3.3.2 (Nov 2020)
hmmsearch: HMMER 3.3.2 (Nov 2020)
java: 11.0.8-internal
kallisto: 0.46.1
mafft: v7.520 (2023/Mar/22)
makeblastdb: makeblastdb 2.2.31+
minimap2: 2.24-r1122
pigz: 2.6
proteinortho: 6.0.16
salmon: salmon 0.14.1
samtools: samtools 1.12
snap: 2006-07-28
stringtie: 2.2.1
tRNAscan-SE: 2.0.9 (July 2021)
tantan: tantan 40
tbl2asn: 25.8
tblastn: tblastn 2.2.31+
trimal: trimAl v1.4.rev15 build[2013-12-17] 
trimmomatic: 0.39
        ERROR: emapper.py not installed
        ERROR: gmes_petap.pl not installed  
        ERROR: pslCDnaFilter not installed  
        ERROR: signalp not installed

metalichen avatar Sep 15 '23 22:09 metalichen

how did you solve this issue? @nextgenusfs I ran into the same issue, and in addition I was getting quite some other errors, some look like:

FUN_029468      Feature overlapped by 2 identical-length genes but has no cross-reference
FUN_029469      Feature overlapped by 2 identical-length genes but has no cross-reference
FUN_029502      Feature begins or ends in gap starting at 1627056
FUN_029528      CDS not contained within cross-referenced mRNA

Then if I look into the tbl file, these two genes indeed have identical length and do not cross reference each other, how should this be fixed?:

686279	688092	gene
			locus_tag	FUN_029468
686279	686935	mRNA
687021	688092
			product	hypothetical protein
			transcript_id	gnl|ncbi|FUN_029468-T1_mrna
			protein_id	gnl|ncbi|FUN_029468-T1
686582	686935	CDS
687021	687803
			codon_start	1
			product	hypothetical protein
			transcript_id	gnl|ncbi|FUN_029468-T1_mrna
			protein_id	gnl|ncbi|FUN_029468-T1
686279	686935	mRNA
687021	687376
687540	688092
			product	hypothetical protein
			transcript_id	gnl|ncbi|FUN_029468-T2_mrna
			protein_id	gnl|ncbi|FUN_029468-T2
686308	686935	CDS
687021	687376
687540	687803
			codon_start	1
			product	hypothetical protein
			transcript_id	gnl|ncbi|FUN_029468-T2_mrna
			protein_id	gnl|ncbi|FUN_029468-T2
686279	687376	mRNA
687540	688092
			product	hypothetical protein
			transcript_id	gnl|ncbi|FUN_029468-T3_mrna
			protein_id	gnl|ncbi|FUN_029468-T3
686308	687376	CDS
687540	687688
			codon_start	1
			product	hypothetical protein
			transcript_id	gnl|ncbi|FUN_029468-T3_mrna
			protein_id	gnl|ncbi|FUN_029468-T3
686279	688092	gene
			locus_tag	FUN_029469
686279	688092	mRNA
			product	hypothetical protein
			transcript_id	gnl|ncbi|FUN_029469-T1_mrna
			protein_id	gnl|ncbi|FUN_029469-T1
686308	687426	CDS
			codon_start	1
			product	hypothetical protein
			transcript_id	gnl|ncbi|FUN_029469-T1_mrna
			protein_id	gnl|ncbi|FUN_029469-T1

for this gene it complains that start or ends in a gap starting at 1627056:

1627057	1625448	gene
			locus_tag	FUN_029502
1627057	1626623	mRNA
1626464	1626390
1626323	1626248
1626122	1625448
			product	hypothetical protein
			transcript_id	gnl|ncbi|FUN_029502-T1_mrna
			protein_id	gnl|ncbi|FUN_029502-T1
1626732	1626623	CDS
1626464	1626390
1626323	1626248
1626124	1625822
			codon_start	1
			product	hypothetical protein
			transcript_id	gnl|ncbi|FUN_029502-T1_mrna
			protein_id	gnl|ncbi|FUN_029502-T1

And for this one it says "CDS not contained within cross-referenced mRNA", but I cannot really see what is wrong with it...

2522815	2523725	gene
			locus_tag	FUN_029528
2522815	2523725	mRNA
			product	hypothetical protein
			transcript_id	gnl|ncbi|FUN_029528-T1_mrna
			protein_id	gnl|ncbi|FUN_029528-T1
2522901	2523377	CDS
			codon_start	1
			product	hypothetical protein
			transcript_id	gnl|ncbi|FUN_029528-T1_mrna
			protein_id	gnl|ncbi|FUN_029528-T1

Emiliania_huxleyi_CCMP1516.models-need-fixing.txt Emiliania_huxleyi_CCMP1516.zip I'd appreciate any help to be able to finish this annotation :)

ruthpg avatar Nov 19 '23 23:11 ruthpg