ClusterManagers.jl
ClusterManagers.jl copied to clipboard
Just copy pasted from readme. I think this resolves issues with tagbot not triggering, but someone please verify since it is a bit of a hit and run.
qsub.jl has incorrect qsub arguments for PBS. We're currently on PBS v20, and qsub does not take -wd or -t
I am running `addprocs_sge` even though my cluster is UGE. Not sure if this is the reason my code isn't working. If I use an interactive node, I can do...
I sometimes get the following failure when adding LSF workers: `ClusterManagers.LSFException("LSF daemon (LIM) not responding ... still trying")` I'm not sure if this message is the same on all systems,...
Hi, after adding and using 4 processors, ClusterManagers fails when it s time to remove the procs. Below a MWE ``` using ClusterManagers using Distributed N_JOBS = 4 addprocs_sge(N_JOBS; queue="single.q")...
Currently the LSFManager adopts a strategy where `bsub` is called multiple times to launch a job for each worker process. It is also possible to create a single job with...
See recent comment on the unduly closed issue #107 ! In a nutshell: telnet connection between worker node and master node fails: ` telnet: connect to address 192.168.1.3: Connection refused...
condor.jl uses telnet, but sometimes job execution environment doesn't have telnet
The ElasticManager does nog export get_connect_cmd. I would like to start the other processes in some automated way, but the info is now only accessible through show(em). It is possible...
There is not much shared code between the managers and most of us only use a single workload/cluster manager so it is difficult to review PRs. ### Finished: - [x]...