Fatal error: fasta.cpp:594 in ApplyActionReport(...): Assertion failed: row.size() == 8 || old_style
Hello,
I encountered some issues when using the conda version of gx due to lacking root permissions, which prevents me from using gx under dock or singularity. Here are the commands I used and the running logs. I hope to get your help.
-
gx align -i A08.assembly.fasta.masked --gx-db all.gxi -o A082gx.reportsRunning log:GX requires the database to be entirely in RAM to avoid thrashing. Consider placing the database files in a non-swappable tmpfs or ramfs. See https://github.com/ncbi/fcs/wiki/FCS-GX for details. Will prefetch (vmtouch) the database pages to have the OS cache them in main memory. Prefetching all.gxs 99%... Prefetched all.gxs in 2184.65s; 0.0811651 GB/s. The file is 100% in RAM. Prefetching all.gxi 99%... Prefetched all.gxi in 562.949s; 0.570597 GB/s. The file is 100% in RAM. Processed 718 queries, 1288.11Mbp in 348.734s. (3.69368Mbp/s); num-jobs:12883 -
gx taxify -i A082gx.reports --gx-db all.gxi --asserted-div 'eudicots' --db-exclude-locs exclude_locs.tsv -o ttr.reportsRunning log:Warning: asserted div 'eudicots' is not represented in the output!exclude_locs.tsv is the repeat sequence positions I obtained after using RepeatMasker softmask. Here is the sample data:
##[["GX locs",1,1]] scaffold_1 1 735 scaffold_1 729 2037 scaffold_1 2038 3109 scaffold_1 3111 3563 scaffold_1 3565 5741 scaffold_1 5742 6145 scaffold_1 6147 6895 scaffold_1 6897 7648 scaffold_1 7650 10584 -
gx clean-genome -i A08.assembly.fasta.masked --action-report ttr.reports -o modified.fastaRunning log:While loading row 1 of the action report: scaffold_1 144006991 0,1573537,343640,0,0 128766594 | Pyrus x bretschneideri 225117 plnt:plants 127731101 126142213 51866 | 3760 plnt:plants 127731101 38505299 17988 | 56867 plnt:mosses 1569340 626402 2903 | 86788 fung:ascomycetes 873231 320337 2749 | Fatal error: fasta.cpp:594 in ApplyActionReport(...): Assertion failed: row.size() == 8 || old_style
I successfully ran the first and second steps, but encountered an error in the third step. How should I modify it to run successfully?
Thanks very much!
Best Regards!
Hello,
I have a few comments about the commands/process you posted.
First, singularity does not require root permissions, so running containerized FCS-GX that way is an option
Most importantly, even if you are using FCS-GX outside of the container, i.e. in the conda distribution, you should be running using run_gx.py, not using the individual gx commands directly.
By running gx commands directly, there are multiple problems that occur, including: the gx taxify --db-exclude-locs exclude_locs.tsv is specified improperly (this is a set of database sequences to mask for the purposes of contamination assignment, not query sequences), the gx taxify --asserted-div 'eudicots' is in the incorrect string format, and the gx clean-genome is run on the incorrect input (there is an intermediate step, classify, that operates between taxify and clean-genome)
So these errors/warnings should go away when you execute run_gx.py directly.
As a side note, it appears you do not have the database saved in RAM. If you are running multiple genomes, prefetching and caching the database each time will slow you down. I suggest looking at the source code repo and #74 for support on how to proceed here.
Eric
Eric,
Thank you very much for your detailed response.
I will follow your suggestions and give it a try.
Best Regards!
Bocheng