OpenCue icon indicating copy to clipboard operation
OpenCue copied to clipboard

Nimby while loop preventing RQD startup

Open EliteIzzy opened this issue 2 years ago • 1 comments

Describe the bug Nimby startup can get stuck in a while loop preventing RQD from spinning up properly if no one is using host when RQD restarts

To Reproduce at least for us on centos 7, have a user logged into a desktop host, then restart the rqd service while not using the machine to keep the while idle loop running.

Expected behavior Nimby should not prevent rqd startup.

Additional context We have fixed in our build by using the same thread timer idea as in other functions in the same class. for class NimbySelect(Nimby):

    def unlockedIdle(self):
        """Nimby State: Machine is idle, host is unlocked,
                        waiting for user activity"""
        log.warning("UnlockedIdle Nimby")
        if self.active and (not self.results[0] == [] or not self.rqCore.machine.isNimbySafeToRunJobs()):
            log.warning("Is active, locking Nimby")
            self.closeEvents()
            self.lockNimby()
            self.thread = threading.Timer(rqd.rqconstants.CHECK_INTERVAL_LOCKED,
                                          self.lockedInUse)
            self.thread.start()

        elif self.active:
            try:
                self.openEvents()
                self.results = select.select(self.fileObjList, [], [], 5)
            # pylint: disable=broad-except
            except Exception:
                log.exception("failed to execute nimby check event")
            if not self.rqCore.machine.isNimbySafeToRunJobs():
                log.warning("memory threshold has been exceeded, locking nimby")
                self.active = True

            self.closeEvents()
            self.thread = threading.Timer(rqd.rqconstants.CHECK_INTERVAL_LOCKED,
                                          self.unlockedIdle)
            self.thread.start()

for class NimbyPynput(Nimby):

    def unlockedIdle(self):
        """Nimby State: Machine is idle, host is unlocked,
                        waiting for user activity"""
        log.warning("UnlockedIdle")
        if self.active and (self.interaction_detected or not self.rqCore.machine.isNimbySafeToRunJobs()):
            log.warning("Is active, locking Nimby")
            self.lockNimby()
            self.thread = threading.Timer(rqd.rqconstants.CHECK_INTERVAL_LOCKED,
                                          self.lockedInUse)
            self.thread.start()

        elif self.active:
            if not self.rqCore.machine.isNimbySafeToRunJobs():
                log.warning("memory threshold has been exceeded, locking nimby")
                self.active = True

            self.thread = threading.Timer(rqd.rqconstants.CHECK_INTERVAL_LOCKED,
                                          self.unlockedIdle)
            self.thread.start()

EliteIzzy avatar Mar 06 '23 21:03 EliteIzzy

currently a bit too busy to submit the fix in a PR, my apologies.

Also realising this is quite hard to test on a single machine, if you remotely restart rqd on a host other than your own you can reproduce.

EliteIzzy avatar Mar 06 '23 21:03 EliteIzzy