FastOMA icon indicating copy to clipboard operation
FastOMA copied to clipboard

nextflow error

Open Song-10-YF opened this issue 1 year ago • 7 comments

executor > local (4) [6f/512ff1] check_input (1) | 3 of 4, failed: 3, retries: 3 [- ] omamer_run - [- ] infer_roothogs - [- ] batch_roothogs - [- ] hog_big - [- ] hog_rest - [- ] collect_subhogs - [- ] extract_pairwise_ortholog_relations - [- ] fastoma_report - [da/21a398] NOTE: Process check_input (1) terminated with an error exit status (1) -- Execution is retried (1) [cf/393603] NOTE: Process check_input (1) terminated with an error exit status (1) -- Execution is retried (2) [b6/087b8b] NOTE: Process check_input (1) terminated with an error exit status (1) -- Execution is retried (3) ERROR ~ Error executing process > 'check_input (1)'

Caused by: Process check_input (1) terminated with an error exit status (1)

Command executed:

fastoma-check-input --proteomes proteome --species-tree species_tree.nwk --out-tree species_tree_checked.nwk --splice splice --hogmap hogmap_in --omamer_db LUCA.h5 -vv

Command exit status: 1

executor > local (4) [6f/512ff1] check_input (1) | 4 of 4, failed: 4, retries: 3 ? [- ] omamer_run - [- ] infer_roothogs - [- ] batch_roothogs -[- ] hog_big - [- ] hog_rest - [- ] collect_subhogs -[- ] extract_pairwise_ortholog_relations - [- ] fastoma_report - [da/21a398] NOTE: Process check_input (1) terminated with an error exit status (1) -- Execution is retried (1) [cf/393603] NOTE: Process check_input (1) terminated with an error exit status (1) -- Execution is retried (2) [b6/087b8b] NOTE: Process check_input (1) terminated with an error exit status (1) -- Execution is retried (3) ERROR ~ Error executing process > 'check_input (1)'

Caused by: Process check_input (1) terminated with an error exit status (1)

Command executed:

fastoma-check-input --proteomes proteome --species-tree species_tree.nwk --out-tree species_tree_checked.nwk --splice splice --hogmap hogmap_in --omamer_db LUCA.h5 -vv

Command exit status: 1

Command output: (empty)

Command error: 2025-01-06 21:21:03 DEBUG Arguments: Namespace(proteomes='proteome', species_tree='species_tree.nwk', out_tree='species_tree_checked.nwk', splice='splice', hogmap='hogmap_in', omamer_db='LUCA.h5', v=2) 2025-01-06 21:21:03 INFO There are 3 files in the proteome folder. 2025-01-06 21:21:03 WARNING We expect that only fa/fasta files are in the proteome folder. Better to remove these ['TCP.pep', 'FN.pep', 'DCP.pep'] 2025-01-06 21:21:03 ERROR There are not enough proteomes in the folder 2025-01-06 21:21:03 ERROR Check input failed. FastOMA halted! 2025-01-06 21:21:03 ERROR Halting FastOMA because of invalid proteome input data

Work dir: /home/songyf/software/FastOMA/work/6f/512ff1e24ecfb95498cfe18fdda78f

Tip: you can try to figure out what's wrong by changing to the process work dir and showing the script file named .command.sh

-- Check '.nextflow.log' file for details Hello, I encountered the above errors while running locally. Why is this happening?

Song-10-YF avatar Jan 06 '25 13:01 Song-10-YF

Hi @Song-10-YF The input proteome files should be in fasta format, ending with .fa. I guess your file names are TCP.pep, FN.pep, DCP.pep. Note that a (rough) species tree in newick format is also needed.

Best, Sina

sinamajidian avatar Jan 06 '25 14:01 sinamajidian

Thanks! But changing the file extension, the staging of the foreign file at https://omabrowser.org/All/LUCA.h5 has been stuck at this step for nearly 10 hours.

Song-10-YF avatar Jan 07 '25 04:01 Song-10-YF

Hi,

the LUCA.h5 file is ~8.8GB large. Depending on your internet connection this might take some time. also, if you're running this on a HPC cluster, please ensure that the node from where you run nextflow has indeed access to the internet.

To check if it the pipeline works otherwise, you could also use a smaller OMAmer database, e.g. https://omabrowser.org/All/Primates.h5 (~100MB).

alpae avatar Jan 07 '25 12:01 alpae

[2790.780s][warning][os,thread] Failed to start thread - pthread_create failed (EAGAIN) for attributes: stacksize: 1024k, guardsize: 0k, detached. executor > local (52) [0a/aa740b] check_input (1) | 1 of 1 ✔ [07/b2443c] omamer_run (TCP.fa) | 3 of 3 ✔ [9f/0cc29f] infer_roothogs (1) | 1 of 1 ✔ [39/ad775d] batch_roothogs (1) | 1 of 1 ✔ [b2/8feefb] hog_big (11) | 0 of 13 [28/050d34] hog_rest (40) | 0 of 43 [- ] collect_subhogs - [- ] ext…airwise_ortholog_relations - [- ] fastoma_report - [2790.839s][warning][os,thread] Failed to start thread - pthread_create failed (EAGAIN) for attributes: stacksize: 1024k, guardsize: 0k, detached. [2790.848s][warning][os,thread] Failed to start thread - pthread_create failed (EAGAIN) for attributes: stacksize: 1024k, guardsize: 0k, detached. [2790.855s][warning][os,thread] Failed to start thread - pthread_create failed (EAGAIN) for attributes: stacksize: 136k, guardsize: 0k, detached. [2790.861s][warning][os,thread] Failed to start thread - pthread_create failed (EAGAIN) for attributes: stacksize: 1024k, guardsize: 0k, detached.

executor > local (53) [0a/aa740b] check_input (1) | 1 of 1 ✔ [07/b2443c] omamer_run (TCP.fa) | 3 of 3 ✔ [9f/0cc29f] infer_roothogs (1) | 1 of 1 ✔ [39/ad775d] batch_roothogs (1) | 1 of 1 ✔ [d4/d39ff0] hog_big (1) | 0 of 13 [28/050d34] hog_rest (40) | 0 of 43 [- ] collect_subhogs - [- ] ext…airwise_ortholog_relations - [- ] fastoma_report - ERROR ~ Execution aborted due to an unexpected error

-- Check '.nextflow.log' file for details

executor > local (53) [0a/aa740b] check_input (1) | 1 of 1 ✔ [07/b2443c] omamer_run (TCP.fa) | 3 of 3 ✔ [9f/0cc29f] infer_roothogs (1) | 1 of 1 ✔ [39/ad775d] batch_roothogs (1) | 1 of 1 ✔ [d4/d39ff0] hog_big (1) | 0 of 13 [28/050d34] hog_rest (40) | 0 of 43 [- ] collect_subhogs - [- ] ext…airwise_ortholog_relations - [- ] fastoma_report - ERROR ~ Execution aborted due to an unexpected error

-- Check '.nextflow.log' file for details ERROR ~ Execution aborted due to an unexpected error

-- Check '.nextflow.log' file for details

Completed at : 2025-01-07T13:14:12.143116+08:00 Duration : 46m 22s Processes : 7 (success), 0 (failed) Output in : Cpal_out Nextflow report : Cpal_out/stats Oops .. something went wrong

executor > local (53) [0a/aa740b] check_input (1) | 1 of 1 ✔ [07/b2443c] omamer_run (TCP.fa) | 3 of 3 ✔ [9f/0cc29f] infer_roothogs (1) | 1 of 1 ✔ [39/ad775d] batch_roothogs (1) | 1 of 1 ✔ [f7/b4345a] hog_big (12) | 1 of 13 [28/050d34] hog_rest (40) | 0 of 43 [- ] collect_subhogs - [- ] ext…airwise_ortholog_relations - [- ] fastoma_report - ERROR ~ Execution aborted due to an unexpected error

-- Check '.nextflow.log' file for details ERROR ~ Execution aborted due to an unexpected error

-- Check '.nextflow.log' file for details WARN: Killing running tasks (46)

Song-10-YF avatar Jan 08 '25 08:01 Song-10-YF

Hi, this looks like a problem with Nextflow itself. what os system are you using? and which profile? what is reported in the .nextflow.log file?

alpae avatar Jan 08 '25 08:01 alpae

I'm using the command: nextflow run FastOMA.nf --input_folder Cpal --output_folder Cpal_out --report My system is CentOS. The version of nextflow is 24.10.3.5933. I suspect the error might be due to insufficient threads.

Thread[process reaper,10,system] [email protected]/java.lang.ProcessHandleImpl.waitForProcessExit0(Native Method) [email protected]/java.lang.ProcessHandleImpl$1.run(ProcessHandleImpl.java:138) [email protected]/java.util.concurrent.ThreadPoolExecutor.runWorker(ThreadPoolExecutor.java:1128) [email protected]/java.util.concurrent.ThreadPoolExecutor$Worker.run(ThreadPoolExecutor.java:628) [email protected]/java.lang.Thread.run(Thread.java:829)

1月-07 13:14:12.104 [main] DEBUG nextflow.Session - Session await > all processes finished 1月-07 13:14:12.114 [main] DEBUG nextflow.Session - Session await > all barriers passed 1月-07 13:14:12.159 [main] INFO nextflow.script.BaseScript - 1月-07 13:14:12.160 [main] INFO nextflow.script.BaseScript - Completed at : 2025-01-07T13:14:12.143116+08:00 1月-07 13:14:12.161 [main] INFO nextflow.script.BaseScript - Duration : 46m 22s 1月-07 13:14:12.162 [main] INFO nextflow.script.BaseScript - Processes : 7 (success), 0 (failed) 1月-07 13:14:12.163 [main] INFO nextflow.script.BaseScript - Output in : Cpal_out Nextflow report : Cpal_out/stats 1月-07 13:14:12.163 [main] INFO nextflow.script.BaseScript - Oops .. something went wrong 1月-07 13:14:12.184 [main] WARN n.processor.TaskPollingMonitor - Killing running tasks (46) 1月-07 13:14:12.277 [main] DEBUG n.processor.TaskPollingMonitor - Failed to kill pending tasks: TaskHandler[id: 23; name: hog_rest (4); status: RUNNING; exit: -; error: -; workDir: /home/songyf/software/FastOMA/work/19/c9b0cb8a8f6446ef23795a43188ce4] -- cause: Cannot run program "bash": error=11, 资源暂时不可用 1月-07 13:14:12.279 [main] DEBUG n.processor.TaskPollingMonitor - Failed to kill pending tasks: TaskHandler[id: 55; name: hog_rest (36); status: RUNNING; exit: -; error: -; workDir: /home/songyf/software/FastOMA/work/4b/094b791afe49b11d1aa9479f823bc6] -- cause: Cannot run program "bash": error=11, 资源暂时不可用 1月-07 13:14:12.280 [main] DEBUG n.processor.TaskPollingMonitor - Failed to kill pending tasks: TaskHandler[id: 58; name: hog_rest (39); status: RUNNING; exit: -; error: -; workDir: /home/songyf/software/FastOMA/work/ee/b30551c715a9e154973b1c06f4c24e] -- cause: Cannot run program "bash": error=11, 资源暂时不可用 1月-07 13:14:12.281 [main] DEBUG n.processor.TaskPollingMonitor - Failed to kill pending tasks: TaskHandler[id: 39; name: hog_rest (20); status: RUNNING; exit: -; error: -; workDir: /home/songyf/software/FastOMA/work/64/0547406d1e0a8322eecf9d5a0f50a6] -- cause: Cannot run program "bash": error=11, 资源暂时不可用 1月-07 13:14:12.282 [main] DEBUG n.processor.TaskPollingMonitor - Failed to kill pending tasks: TaskHandler[id: 7; name: hog_rest (1); status: RUNNING; exit: -; error: -; workDir: /home/songyf/software/FastOMA/work/76/43ee1bcaac41291d4ad3c97a275816] -- cause: Cannot run program "bash": error=11, 资源暂时不可用 1月-07 13:14:12.314 [main] DEBUG n.processor.TaskPollingMonitor - Failed to kill pending tasks: TaskHandler[id: 49; name: hog_rest (30); status: RUNNING; exit: -; error: -; workDir: /home/songyf/software/FastOMA/work/66/3b3f49cd341f6d4c5bffed22b0f90d] -- cause: Cannot run program "bash": error=11, 资源暂时不可用 1月-07 13:14:13.197 [main] DEBUG n.trace.WorkflowStatsObserver - Workflow completed > WorkflowStats[succeededCount=7; failedCount=0; ignoredCount=0; cachedCount=0; pendingCount=9; submittedCount=0; runningCount=0; retriesCount=0; abortedCount=46; succeedDuration=53m 43s; failedDuration=0ms; cachedDuration=0ms;loadCpus=0; loadMemory=20 GB; peakRunning=47; peakCpus=59; peakMemory=592 GB; ] 1月-07 13:14:13.197 [main] DEBUG nextflow.trace.TraceFileObserver - Workflow completed -- saving trace file 1月-07 13:14:13.200 [main] DEBUG nextflow.trace.ReportObserver - Workflow completed -- rendering execution report 1月-07 13:14:14.563 [main] DEBUG nextflow.trace.TimelineObserver - Workflow completed -- rendering execution timeline 1月-07 13:14:14.890 [main] DEBUG nextflow.cache.CacheDB - Closing CacheDB done 1月-07 13:14:14.924 [main] DEBUG nextflow.script.ScriptRunner - > Execution complete -- Goodbye

Song-10-YF avatar Jan 08 '25 08:01 Song-10-YF

Thanks for sharing the logs. It seems that the system ran out of threads, which should not happen normally.

  1. Is the system you are using shared with many users? and many processes are running (you can check with top or htop)
  2. Could you run it again? It might be on time issue with the system.
  3. Also, can you run this cat /proc/sys/kernel/threads-max for me the output is 12382340.
  4. btw have you tried running on the test dataset provided on the github?

You can add -resume to nextflow run to resume your previous run. Sometimes it is better to start over from an empty folder. Note that nextflow creates work folder and some hidden files (you can check with ls -a).

Btw, you can add --omamer_db LUCA.h5 or the primates.h5 to command line to tell nextflow not download the file again (avoid staging step).

sinamajidian avatar Jan 08 '25 15:01 sinamajidian