Nextpolish unfinished
Question or Expected behavior I launch next polish to correct a genome with Pacbio data, I got a partial correction but the process never ends.
This is the command : $ singularity exec --bind /data1/nextpolish:/mnt /home/ulg/Documents/program/test-singu/nphase/nextpolish.sif python /opt/NextPolish/lib/nextpolish2.py -g /mnt/60444-hybrid-complete.fasta -l /mnt/pb.map.bam.fofn -r clr -p 25 -a -s -o /mnt/pb.asm.nextpolish1.fa
it produced a partial pb.asm.nextpolish1.fa with 6389 contigs (while the genome has 6402 contigs). I have this message and nothing happen since 2 days: gap_aln:0 sup_aln:0 depth_cluster:0 gap_cluster:0 split_count:0 bin_len:0 median_depth:0
Operating system
Distributor ID: Ubuntu Description: Ubuntu 18.04.5 LTS Release: 18.04 Codename: bionic
Sibgularity 3, Centos 7
GCC
$ singularity exec --bind /data1/nextpolish:/mnt /home/ulg/Documents/program/test-singu/nphase/nextpolish.sif gcc -v
Using built-in specs. COLLECT_GCC=gcc COLLECT_LTO_WRAPPER=/usr/libexec/gcc/x86_64-redhat-linux/4.8.5/lto-wrapper Target: x86_64-redhat-linux Configured with: ../configure --prefix=/usr --mandir=/usr/share/man --infodir=/usr/share/info --with-bugurl=http://bugzilla.redhat.com/bugzilla --enable-bootstrap --enable-shared --enable-threads=posix --enable-checking=release --with-system-zlib --enable-__cxa_atexit --disable-libunwind-exceptions --enable-gnu-unique-object --enable-linker-build-id --with-linker-hash-style=gnu --enable-languages=c,c++,objc,obj-c++,java,fortran,ada,go,lto --enable-plugin --enable-initfini-array --disable-libgcj --with-isl=/builddir/build/BUILD/gcc-4.8.5-20150702/obj-x86_64-redhat-linux/isl-install --with-cloog=/builddir/build/BUILD/gcc-4.8.5-20150702/obj-x86_64-redhat-linux/cloog-install --enable-gnu-indirect-function --with-tune=generic --with-arch_32=x86-64 --build=x86_64-redhat-linux Thread model: posix gcc version 4.8.5 20150623 (Red Hat 4.8.5-39) (GCC)
Python What version of Python are you using? $ singularity exec --bind /data1/nextpolish:/mnt /home/ulg/Documents/program/test-singu/nphase/nextpolish.sif python --version
Python 3.6.8
NextPolish What version of NextPolish are you using? $ singularity exec --bind /data1/nextpolish:/mnt /home/ulg/Documents/program/test-singu/nphase/nextpolish.sif nextPolish -v
nextPolish v1.2.4
Additional context (Optional) My computer has 64 Gb of Ram, can it be the issue (for a 1,5 Gb genome) ?
Thanks for the help, Luc
- The max memory required by NextPolish is depending on the value of
-p, each sub-process required 2-5Gb RAM depending on the mapped data depth. So 64 Gb of Ram is enough if you remove-aoptions, which will automatically adjust-p. - Some of sub-processes are crashed (usually caused by insufficient memory), which blocked the main process. This is a bug of multiprocess module in python. so you should kill the main task, and just rerun the main task. It it will skip corrected seqs and continue running.
- It is better to update NexPolish to the latest version and try again. The polishing step is very fast, it should be finished in few minutes depending on your genome size and
-p.
Indeed, delete the -p -a option make it work again. I relaunch the the command and it works.
My corrected file, pb.asm.nextpolish1.fa, has multiple time the same deflines .
grep scaffold_300 pb.asm.nextpolish1.fa
scaffold_300_pilon_pilon_subseq_1:49068_obj 49236 scaffold_300_pilon_pilon_subseq_1:49068_obj 49236 scaffold_300_pilon_pilon_subseq_1:49068_obj 49236
Is it because I run the pipeline multiple times, can I just keep one of the sequences ?
Thanks, Luc
In theory, NextPolish will skip polished contigs, so it should not have duplicate IDs, I need more time to debug it. By now, it is more safer to remove pb.asm.nextpolish1.fa and rerun this step, it should be finished in 10 -30 minutes if mapped depth < 100.