Automatic regrade: add maximum limit on retry attempts and rotate the submission to be resent
When the Pending Submissions queue grows longer than GRADER_STABLE_THRESHOLD retries, the system goes into "UNSTABLE" mode where it only retries one submission per time window. The original thinking was to not unnecessarily load the system that may be in unstable situation. Currently the submission that is retried in this situation is always the same submission, causing that particular submission to be retried several times, and others not at all. The retried submission should be rotated so that retries are not accumulated on same submission. Also, there should be an upper limit for submission retries, because the problem may be a faulty grader that is not fixed right away.
This should be fixed this autumn. It costs time for the sysadmins to manually remove submissions from the infinite retry loop since the system can retry the same submission forever even though it is clearly not going to pass.
@PasiSa Do you think the upper limit should be for retries per submission, or by the total number of retries?
If I recall right, the common reason for grading jobs not completing was that grader for an exercise was somehow broken. So the retries started to cumulate on submissions for a particular exercise, while other exercises still worked. So perhaps the best option would be exercise-specific count.
I don't remember why the upper limit was mentioned in the issue description though, because after the problem with exercise grader is fixed (even if it might take days), it would be nice if the system started automatically regrading the earlier unfinished submissions. So some kind of probing might still be useful to detect when grading starts to work (but not very frequently).