CONSENT icon indicating copy to clipboard operation
CONSENT copied to clipboard

Segmentation fault during polishing step

Open hasindu2008 opened this issue 5 years ago • 4 comments

Hi,

I have been recently attempting to polish a draft (human genome) constructed from a PromethION sample. However, at the polishing step, it returns a segmentation fault. Any suggestions on fixing this?

Command used:

CONSENT-polish  --contigs $DRAFT --reads $READS  --out $OUTPUT  --nproc 64 -m 50G

Stdout:

[Wed Sep 16 14:07:21 AEST 2020] Aligning the long reads to the contigs (minimap2)
[Wed Sep 16 16:28:59 AEST 2020] Sorting the overlaps
[Thu Sep 17 20:11:04 AEST 2020] Polishing the contigs

stderr:

[M::mm_idx_gen::84.924*1.86] collected minimizers
[M::mm_idx_gen::92.015*3.89] sorted minimizers
[M::main::92.015*3.89] loaded/built the index for 3855 target sequence(s)
[M::mm_mapopt_update::96.598*3.76] mid_occ = 667
[M::mm_idx_stat] kmer size: 15; skip: 5; is_hpc: 0; #seq: 3855
[M::mm_idx_stat::99.125*3.69] distinct minimizers: 165963083 (35.72% are singletons); average occurrences: 5.831; average spacing: 2.922
[M::worker_pipeline::174.006*15.81] mapped 87711 sequences
......
[M::worker_pipeline::8492.457*27.15] mapped 25672 sequences
[M::main] Version: 2.14-r883
[M::main] CMD: minimap2 --dual=yes -PD --no-long-join -w5 -g1000 -m30 -n1 -t64 -I50G assembly.fasta pass.fastq
[M::main] Real time: 8495.410 sec; CPU: 230553.598 sec; Peak RSS: 37.940 GB
CONSENT-polish: line 203: 39123 Segmentation fault      (core dumped) $LRSCf/bin/CONSENT-polishing -a $tmpdir/"$alignments" -s "$minSupport" -S "$maxSupport" -l "$windowSize" -k "$merSize" -c "$commonKMers" -A "$minAnchors" -f "$solid" -m "$windowOverlap" -j "$nproc" -r "$contigs" -R "$reads" -M "$maxMSA" -p "$LRSCf" >> "$out"
Command exited with non-zero status 139

hasindu2008 avatar Sep 18 '20 01:09 hasindu2008

Hi,

Do you know if CONSENT crashes right away when starting the polishing step?

Usual errors include using FASTQ reads when CONSENT only supports FASTA (but when polishing an assembly, I'm pretty sure your contigs are FASTA), or not using a "one sequence per line" formatted FASTA file. Can you check whether or not your input file is in "one sequence per line" format?

Best, Pierre

morispi avatar Sep 21 '20 13:09 morispi

As I ran a batch job could not determine when exactly the crash occurs.

My reads are in FASTQ and the contigs are in FASTA. Should the reads also should be in FASTA? My contigs seem to be in multi-line FASTA and maybe that is the problem.

hasindu2008 avatar Sep 22 '20 01:09 hasindu2008

Yes, both reads and contigs should be in FASTA. Both read and contigs should also be in "one sequence per line" format. The problem seems to come from here then. Can you update me if you try again after converting everything to "one sequence per line" FASTA?

Best, Pierre

morispi avatar Sep 22 '20 05:09 morispi

Hi,

I recently updated CONSENT and it now accepts both FASTA and FASTQ as input, and sequences are no long required to be in "one sequence per line" format. I believe this should fix your issue.

Leaving it open for now, but don't hesitate to update me if you encounter further errors.

Best, Pierre

morispi avatar Dec 09 '20 12:12 morispi