Batch icon indicating copy to clipboard operation
Batch copied to clipboard

Autoscale formula metrics for number of task slots

Open okofish opened this issue 4 years ago • 1 comments

Feature Request Description

My Azure Batch-based application uses a constraint-based scheduling convention wherein task slots represent vCPUs. If I have a pool using a 4-vCPU VM size then I set "Task slots per node" to 4, and when I submit a task that needs access to 2 vCPUs I set the task's "Required slots" to 2.

I have been unable to get this scheme to work reliably with autoscaling, because the autoscale formula language is unaware of tasks' task slot requirements. Information on task slots per node is available in the $TaskSlotsPerNode variable, but there does not appear to be any information on the slot requirements of existing tasks. This means that any autoscaling formula implicitly bakes in the assumption that every task requires exactly one slot.

Describe Preferred Solution

I think the ideal solution is to introduce a task-slot-wise version of each task metric variable. This might look like:

Existing task-wise metric New task-slot-wise metric
$ActiveTasks $ActiveTaskSlots
$RunningTasks $RunningTaskSlots
$PendingTasks $PendingTaskSlots
$SucceededTasks $SucceededTaskSlots
$FailedTasks $FailedTaskSlots

Describe Alternatives Considered

The current task-wise metric variables could be changed to reflect task slots instead of whole tasks. This would be a breaking change.

Additional Context

The proposed addition of task-slot-wise metrics would be analogous to the addition of the TaskSlotCounts object to the response of the Job_GetTaskSlots operation made in API version 2020-09-01.12.0.

okofish avatar Oct 26 '21 20:10 okofish

Hi @alfpark Does "known issue" mean this will not get fixed? Sounds more like a feature request to me and a very valueable one!

I have another use case where I run into the exact same problem. I have a Job which has tasks which depend on each other, there are one task per job which is more compute intense then others, I usually set that one to max slots per node so that it runs on one VM with all the resources available.

The dependent tasks later are 10x more tasks but they run faster and only need one CPU, so, in that case I set slots to 1 for those tasks.

Now, its impossible to calculate how many nodes are actually needed with auto scaling...

So, my question would be, will this feature be added soon, to use task slot variables within the formula instead of tasks? Or is there another solution or workaround for those use cases I should consider instead of auto scaling?

Thanks, Michael

MichaCo avatar Jun 21 '22 14:06 MichaCo