fcs icon indicating copy to clipboard operation
fcs copied to clipboard

Fatal error: fasta.cpp:594 in ApplyActionReport(...): Assertion failed: row.size() == 8 || old_style

Open polchan opened this issue 1 year ago • 2 comments

Hello, I encountered some issues when using the conda version of gx due to lacking root permissions, which prevents me from using gx under dock or singularity. Here are the commands I used and the running logs. I hope to get your help.

  1. gx align -i A08.assembly.fasta.masked --gx-db all.gxi -o A082gx.reports Running log:

    GX requires the database to be entirely in RAM to avoid thrashing.
    Consider placing the database files in a non-swappable tmpfs or ramfs.
    See https://github.com/ncbi/fcs/wiki/FCS-GX for details.
    Will prefetch (vmtouch) the database pages to have the OS cache them in main memory.
    
    Prefetching all.gxs 99%...
    Prefetched all.gxs in 2184.65s; 0.0811651 GB/s. The file is 100% in RAM.
    Prefetching all.gxi 99%...
    Prefetched all.gxi in 562.949s; 0.570597 GB/s. The file is 100% in RAM.
    Processed 718 queries, 1288.11Mbp in 348.734s. (3.69368Mbp/s); num-jobs:12883
    
  2. gx taxify -i A082gx.reports --gx-db all.gxi --asserted-div 'eudicots' --db-exclude-locs exclude_locs.tsv -o ttr.reports Running log:

    Warning: asserted div 'eudicots' is not represented in the output!
    

    exclude_locs.tsv is the repeat sequence positions I obtained after using RepeatMasker softmask. Here is the sample data:

    ##[["GX locs",1,1]]
    scaffold_1      1       735
    scaffold_1      729     2037
    scaffold_1      2038    3109
    scaffold_1      3111    3563
    scaffold_1      3565    5741
    scaffold_1      5742    6145
    scaffold_1      6147    6895
    scaffold_1      6897    7648
    scaffold_1      7650    10584
    
  3. gx clean-genome -i A08.assembly.fasta.masked --action-report ttr.reports -o modified.fasta Running log:

    While loading row 1 of the action report:
    scaffold_1      144006991       0,1573537,343640,0,0    128766594       |       Pyrus x bretschneideri  225117  plnt:plants     127731101       126142213   51866    |       3760    plnt:plants     127731101       38505299        17988   |       56867   plnt:mosses     1569340 626402  2903    |       86788   fung:ascomycetes     873231  320337  2749    |
    Fatal error: fasta.cpp:594 in ApplyActionReport(...): Assertion failed: row.size() == 8 || old_style
    

I successfully ran the first and second steps, but encountered an error in the third step. How should I modify it to run successfully?

Thanks very much!

Best Regards!

polchan avatar Sep 04 '24 14:09 polchan

Hello,

I have a few comments about the commands/process you posted.

First, singularity does not require root permissions, so running containerized FCS-GX that way is an option

Most importantly, even if you are using FCS-GX outside of the container, i.e. in the conda distribution, you should be running using run_gx.py, not using the individual gx commands directly.

By running gx commands directly, there are multiple problems that occur, including: the gx taxify --db-exclude-locs exclude_locs.tsv is specified improperly (this is a set of database sequences to mask for the purposes of contamination assignment, not query sequences), the gx taxify --asserted-div 'eudicots' is in the incorrect string format, and the gx clean-genome is run on the incorrect input (there is an intermediate step, classify, that operates between taxify and clean-genome)

So these errors/warnings should go away when you execute run_gx.py directly.

As a side note, it appears you do not have the database saved in RAM. If you are running multiple genomes, prefetching and caching the database each time will slow you down. I suggest looking at the source code repo and #74 for support on how to proceed here.

Eric

etvedte avatar Sep 04 '24 15:09 etvedte

Eric,

Thank you very much for your detailed response.

I will follow your suggestions and give it a try.

Best Regards!

Bocheng

polchan avatar Sep 05 '24 01:09 polchan