Handling of busy LSF deamon
I sometimes get the following failure when adding LSF workers:
ClusterManagers.LSFException("LSF daemon (LIM) not responding ... still trying")
I'm not sure if this message is the same on all systems, but if it is then maybe its worth adding to the set of expected responses here.
i have not seen that error message. what version of LSF are you using? mine is:
$ lsid
IBM Spectrum LSF Standard 10.1.0.9, Oct 16 2019
Copyright International Business Machines Corp. 1992, 2016.
US Government Users Restricted Rights - Use, duplication or disclosure restricted by GSA ADP Schedule Contract with IBM Corp.
This is mine:
$ lsid
IBM Spectrum LSF Standard 10.1.0.10, Apr 10 2020
Just to clarify: This message comes every now and then as a result of any lsf command (e.g. bsub, bjobs, bhosts etc) and prints every few seconds. After a while (max a few minutes) the lsf command is successfully executed.
It might be some (potentially home-made) overload protection. I will ask my admins about it.
Admins responded that the message most likely appears due to restarts in conjunction with reconfiguration of the server.
Maybe one could use this list to sort out the non-fatal messages? I'm pretty sure I have seen the LSF is processing your request. Please wait… message as well.