Nimby while loop preventing RQD startup
Describe the bug Nimby startup can get stuck in a while loop preventing RQD from spinning up properly if no one is using host when RQD restarts
To Reproduce at least for us on centos 7, have a user logged into a desktop host, then restart the rqd service while not using the machine to keep the while idle loop running.
Expected behavior Nimby should not prevent rqd startup.
Additional context We have fixed in our build by using the same thread timer idea as in other functions in the same class. for class NimbySelect(Nimby):
def unlockedIdle(self):
"""Nimby State: Machine is idle, host is unlocked,
waiting for user activity"""
log.warning("UnlockedIdle Nimby")
if self.active and (not self.results[0] == [] or not self.rqCore.machine.isNimbySafeToRunJobs()):
log.warning("Is active, locking Nimby")
self.closeEvents()
self.lockNimby()
self.thread = threading.Timer(rqd.rqconstants.CHECK_INTERVAL_LOCKED,
self.lockedInUse)
self.thread.start()
elif self.active:
try:
self.openEvents()
self.results = select.select(self.fileObjList, [], [], 5)
# pylint: disable=broad-except
except Exception:
log.exception("failed to execute nimby check event")
if not self.rqCore.machine.isNimbySafeToRunJobs():
log.warning("memory threshold has been exceeded, locking nimby")
self.active = True
self.closeEvents()
self.thread = threading.Timer(rqd.rqconstants.CHECK_INTERVAL_LOCKED,
self.unlockedIdle)
self.thread.start()
for class NimbyPynput(Nimby):
def unlockedIdle(self):
"""Nimby State: Machine is idle, host is unlocked,
waiting for user activity"""
log.warning("UnlockedIdle")
if self.active and (self.interaction_detected or not self.rqCore.machine.isNimbySafeToRunJobs()):
log.warning("Is active, locking Nimby")
self.lockNimby()
self.thread = threading.Timer(rqd.rqconstants.CHECK_INTERVAL_LOCKED,
self.lockedInUse)
self.thread.start()
elif self.active:
if not self.rqCore.machine.isNimbySafeToRunJobs():
log.warning("memory threshold has been exceeded, locking nimby")
self.active = True
self.thread = threading.Timer(rqd.rqconstants.CHECK_INTERVAL_LOCKED,
self.unlockedIdle)
self.thread.start()
currently a bit too busy to submit the fix in a PR, my apologies.
Also realising this is quite hard to test on a single machine, if you remotely restart rqd on a host other than your own you can reproduce.