cms icon indicating copy to clipboard operation
cms copied to clipboard

Cannot evaluate communication tasks with at least 10 user processes

Open Daniel-Aga opened this issue 3 years ago • 2 comments

Description: The system cannot evaluate submissions for communication tasks which are configured to have at least $10$ user processes. The submissions are "stuck" in the evaluation phase and are re-evaluated (indefinitely?). In the sandbox logs, the status is XX, and the message is execve("./task"):Resource Temporarily Unavailable. I believe the reason for this behavior is the definition of box_id in cms/grading/Sandbox.py, Lines 861-873. The code there allocates $10$ ids per worker shard, and when the worker tries to evaluate a submission with at least $10$ user processes (plus an additional manager process), duplicate ids are used.

As a workaround, one could increase the number of ids allocated for each worker shard in Sandbox.py, but perhaps we can find a more generic fix.

Expected: The submissions should be evaluated correctly.

Actual: The submissions are re-evaluated due to sandbox errors.

System Information

CMS version: 1.4.rc1 Was CMS installed: yes Using a virtualenv: no

Daniel-Aga avatar May 20 '22 18:05 Daniel-Aga

Thanks for this report. I think we should look into having a more reliable solution.

Out of curiosity: does your use case require a fixed amount of box ids (which is >= 10) or do you need a variable amount of box ids?

wil93 avatar Nov 27 '22 17:11 wil93

Initially I wanted to run a large (though fixed amount, around 500 boxes), but in the end I figured out a way to rephrase the task and use only 3, so it was fine 😃

By the way, when I worked on this I tried to increase the number of sandbox ids and I encountered another problem: 29 user processes worked, but 30 (or more) didn't. The reason turned out to be that isolate only allowed 64 open files for each process. So the manager could not open the fifos to the last user process. In the latest version of isolate, it can take as a command line argument ("-n") the number of open files, so after cms will migrate to this version, we should make sure to use the correct open files limit when we initialize the sandbox in communication tasks. I wrote about it in gitter back then and wanted to add this info to this issue, but forgot...

Daniel-Aga avatar Nov 28 '22 18:11 Daniel-Aga