pgap icon indicating copy to clipboard operation
pgap copied to clipboard

Error: Final process status is permanentFail

Open sekhwal opened this issue 1 year ago • 13 comments

Hi, I am following my previous issue #304, it has been closed.

I am already using 'salmonella' in submol.yaml, but I am not able to get the results. When I change the genus_species as 'Escherichia coli' pgap keep running for so long with generating the results.

topology: 'circular' location: 'chromosome' organism: genus_species: 'salmonella' strain: 'P1620800_chr'

sekhwal avatar May 07 '24 20:05 sekhwal

1/ Salmonella is not "species", it's "genus" 2/ We lost the functionality of supporting "genus" option in this release and we are working on restoring it soon 3/ Case might be important (usually biologists always capitalize genus in binomials, so I am not familiar with this use case).

Please try

genus_species: 'Salmonella enterica'

or other legitimate Salmonella species.

azat-badretdin avatar May 07 '24 22:05 azat-badretdin

Thank you for the information. It works, but when I run it with location: 'plasmid' it generates the same error "Final process status is permanentFail".

Please let me know what change I should make in the submol.yaml file. Here is the information of my current submol.yaml file that I am trying to run for plasmid genome.

topology: 'circular' location: 'plasmids' organism: genus_species: 'Salmonella enterica' strain: 'P1122481'

sekhwal avatar May 08 '24 16:05 sekhwal

location: 'plasmids'

Should be strictly 'plasmid' or 'chromosome'

azat-badretdin avatar May 08 '24 17:05 azat-badretdin

You can also try using our relatively new way of running pgap.py specified in quick notes, where all the information is in FASTA file and species qualification:

./pgap.py .... -s 'My species' -g My.fasta

In this case you can specify plasmid molecules by appending [location=plasmid] to your FASTA definition lines for corresponding sequences

azat-badretdin avatar May 08 '24 17:05 azat-badretdin

I tried the following way python3 /scripts/pgap.py -r -o P1122481_results -s 'Salmonella enterica' -g P1122481.fasta

I am using the fasta file with the header

1_length=4998493_depth=1.00x_circular=true_[location=chromosome]

But still generating the issue ""Final process status is permanentFail".

sekhwal avatar May 08 '24 19:05 sekhwal

In another way, I used correctly location: 'plasmid' in in the submol.yaml but it still unable to run.

topology: 'circular' location: 'plasmid' organism: genus_species: 'Salmonella enterica' strain: 'P1122481'

sekhwal avatar May 08 '24 20:05 sekhwal

1_length=4998493_depth=1.00x_circular=true_[location=chromosome]

Please review https://github.com/ncbi/pgap/wiki/Input-Files#Genome-assembly-sequence-file. There are several characters that are not allowed in this SeqID (the SeqID is everything before the first space). You can try SeqID of 1 and add modifiers: 1 [topology=circular] [location=chromosome] Length and depth are not supported modifiers according to: https://www.ncbi.nlm.nih.gov/genbank/mods_fastadefline/

thibaudnis avatar May 08 '24 20:05 thibaudnis

But still generating the issue ""Final process status is permanentFail".

Could you please post the resulting cwltool.log file? Thanks!

azat-badretdin avatar May 09 '24 08:05 azat-badretdin

It seems the header line is correct. And it is still showing an error "WARNING Final process status is permanentFail " with plasmid sequence. However, it works with 'chromosome' even I did not change any in the header ">1 length=4998493 depth=1.00x circular=true".

##used command python3 /scripts/pgap.py -r -o P1122481_plasmid input_P1122481_plasmid.yaml

##plasmid fasta file header

contig001 [location = plasmid] [plasmid-name = pPSU1122481] [topology=circular]

##Here is the .yaml file fasta: class: File location: P1122481_plasmid.fasta submol: class: File location: P1122481_plasmid1_submol.yaml

cwltool.log topology: 'circular' location: 'plasmid' organism: genus_species: 'Salmonella enterica' strain: 'pPSU1122481'

sekhwal avatar May 09 '24 16:05 sekhwal

It seems, it does not work with small genomes like plasmid. I used pgap earlier and it worked perfectly without concerning about any specify header and special letters. Should I download old version and try?

sekhwal avatar May 09 '24 16:05 sekhwal

Try ./pgap.py --ignore-errors ....

azat-badretdin avatar May 09 '24 17:05 azat-badretdin

It works when I use both chromosome and plasmid in one fasta file. I think the latest pgap version has issue of having small genome like plasmid. ##command python3 /scripts/pgap.py -r -o P2226300_results input_P2226300.yaml Thank you for your help!

sekhwal avatar May 10 '24 17:05 sekhwal

It works when I use both chromosome and plasmid in one fasta file.

Because with chromosome, the total size of the genome matches the expectation for this particular species.

It does not reject plasmids per se (you can try to replace kewword plasmid with chromosome in that small plasmid FASTA file) and see for yourself - the result will be the same, because it rejects by size, not by molecule type

Have you tried inserting --ignore-errors into the list of command line switches?

azat-badretdin avatar May 10 '24 19:05 azat-badretdin

@azat-badretdin I have a similar issue. Please find attached my cwtool.log file cwltool.log

vappiah avatar May 13 '24 20:05 vappiah

User @vappiah I am not so sure. It says

'contig001[location=chromosome]' is not a valid local ID (m_Pos = 1)

which most likely means that you omitted quite crucial space delimiter separating seq-id from the rest of FASTA definition line

It's a different error from the same ballpark "things that users do in FASTA definition line"

azat-badretdin avatar May 14 '24 01:05 azat-badretdin

Thanks @azat-badretdin . I made the necessary correction and it works now.

vappiah avatar May 14 '24 12:05 vappiah

Glad to hear that, user @vappiah !

azat-badretdin avatar May 14 '24 12:05 azat-badretdin