NextPolish Nextpolish unfinished

Question or Expected behavior I launch next polish to correct a genome with Pacbio data, I got a partial correction but the process never ends.

This is the command : $ singularity exec --bind /data1/nextpolish:/mnt /home/ulg/Documents/program/test-singu/nphase/nextpolish.sif python /opt/NextPolish/lib/nextpolish2.py -g /mnt/60444-hybrid-complete.fasta -l /mnt/pb.map.bam.fofn -r clr -p 25 -a -s -o /mnt/pb.asm.nextpolish1.fa

it produced a partial pb.asm.nextpolish1.fa with 6389 contigs (while the genome has 6402 contigs). I have this message and nothing happen since 2 days: gap_aln:0 sup_aln:0 depth_cluster:0 gap_cluster:0 split_count:0 bin_len:0 median_depth:0

Operating system

Distributor ID: Ubuntu Description: Ubuntu 18.04.5 LTS Release: 18.04 Codename: bionic

Sibgularity 3, Centos 7

GCC

$ singularity exec --bind /data1/nextpolish:/mnt /home/ulg/Documents/program/test-singu/nphase/nextpolish.sif gcc -v

Using built-in specs. COLLECT_GCC=gcc COLLECT_LTO_WRAPPER=/usr/libexec/gcc/x86_64-redhat-linux/4.8.5/lto-wrapper Target: x86_64-redhat-linux Configured with: ../configure --prefix=/usr --mandir=/usr/share/man --infodir=/usr/share/info --with-bugurl=http://bugzilla.redhat.com/bugzilla --enable-bootstrap --enable-shared --enable-threads=posix --enable-checking=release --with-system-zlib --enable-__cxa_atexit --disable-libunwind-exceptions --enable-gnu-unique-object --enable-linker-build-id --with-linker-hash-style=gnu --enable-languages=c,c++,objc,obj-c++,java,fortran,ada,go,lto --enable-plugin --enable-initfini-array --disable-libgcj --with-isl=/builddir/build/BUILD/gcc-4.8.5-20150702/obj-x86_64-redhat-linux/isl-install --with-cloog=/builddir/build/BUILD/gcc-4.8.5-20150702/obj-x86_64-redhat-linux/cloog-install --enable-gnu-indirect-function --with-tune=generic --with-arch_32=x86-64 --build=x86_64-redhat-linux Thread model: posix gcc version 4.8.5 20150623 (Red Hat 4.8.5-39) (GCC)

Python What version of Python are you using? $ singularity exec --bind /data1/nextpolish:/mnt /home/ulg/Documents/program/test-singu/nphase/nextpolish.sif python --version

Python 3.6.8

NextPolish What version of NextPolish are you using? $ singularity exec --bind /data1/nextpolish:/mnt /home/ulg/Documents/program/test-singu/nphase/nextpolish.sif nextPolish -v

nextPolish v1.2.4

Additional context (Optional) My computer has 64 Gb of Ram, can it be the issue (for a 1,5 Gb genome) ?

Thanks for the help, Luc

Aug 30 '20 09:08 Lcornet

The max memory required by NextPolish is depending on the value of -p, each sub-process required 2-5Gb RAM depending on the mapped data depth. So 64 Gb of Ram is enough if you remove -a options, which will automatically adjust -p.
Some of sub-processes are crashed (usually caused by insufficient memory), which blocked the main process. This is a bug of multiprocess module in python. so you should kill the main task, and just rerun the main task. It it will skip corrected seqs and continue running.
It is better to update NexPolish to the latest version and try again. The polishing step is very fast, it should be finished in few minutes depending on your genome size and -p.

Aug 31 '20 01:08 moold

Indeed, delete the -p -a option make it work again. I relaunch the the command and it works.

My corrected file, pb.asm.nextpolish1.fa, has multiple time the same deflines .

grep scaffold_300 pb.asm.nextpolish1.fa

scaffold_300_pilon_pilon_subseq_1:49068_obj 49236 scaffold_300_pilon_pilon_subseq_1:49068_obj 49236 scaffold_300_pilon_pilon_subseq_1:49068_obj 49236

Is it because I run the pipeline multiple times, can I just keep one of the sequences ?

Thanks, Luc

Aug 31 '20 08:08 Lcornet

In theory, NextPolish will skip polished contigs, so it should not have duplicate IDs, I need more time to debug it. By now, it is more safer to remove pb.asm.nextpolish1.fa and rerun this step, it should be finished in 10 -30 minutes if mapped depth < 100.

Aug 31 '20 08:08 moold