odc fix(taskframework): job is not started in resource idle node when deployed multi-nodes

What type of PR is this?

type-bug

What this PR does / why we need it:

job is not started when multi nodes has enough resource, when JobStore acquire trigger but resource is not available , we should release it and give a chance to other node acquire trigger.

Which issue(s) this PR fixes:

Special notes for your reviewer:

test case with two nodes, task be started in equalization in two nodes.

Additional documentation e.g., usage docs, etc.:

May 11 '24 11:05 krihy

please add more info to tell the reviewer that how does this issue happened and how do you solve it, I can get the issue and solution by your code @krihy

May 13 '24 09:05 yhilmare

I can not accept your solution:

StartPreparingJob is a daemon job on every node, we'd better not disable it on a specific node
we have already got rateLimiter MonitorProcessRateLimiter, why does it work?
Will other nodes be affected if you releaseAcquiredTrigger in ResourceDetectJobStore?

May 13 '24 09:05 yhilmare

I can not accept your solution:

StartPreparingJob is a daemon job on every node, we'd better not disable it on a specific node

we have already got rateLimiter MonitorProcessRateLimiter, why does it work?

Will other nodes be affected if you releaseAcquiredTrigger in ResourceDetectJobStore?

in cluster model, all quartz node seize trigger, and be fired exact only once, if one node with no resource acquire trigger and fired it, the job will not be started in StartPreparingJob in QuartzSchedulerThread while loop and other node with enough resource will not acquire trigger

May 13 '24 10:05 krihy

I can not accept your solution:

StartPreparingJob is a daemon job on every node, we'd better not disable it on a specific node

we have already got rateLimiter MonitorProcessRateLimiter, why does it work?

Will other nodes be affected if you releaseAcquiredTrigger in ResourceDetectJobStore?

in cluster model, all quartz node seize trigger, and be fired exact only once, if one node with no resource acquire trigger and fired it, the job will not be started in StartPreparingJob in QuartzSchedulerThread while loop and other node with enough resource will not acquire trigger

as we discused before, in cluster model, each node should do StartPreparingJob

May 13 '24 10:05 yizhouxw

I can not accept your solution:

StartPreparingJob is a daemon job on every node, we'd better not disable it on a specific node

we have already got rateLimiter MonitorProcessRateLimiter, why does it work?

Will other nodes be affected if you releaseAcquiredTrigger in ResourceDetectJobStore?

in cluster model, all quartz node seize trigger, and be fired exact only once, if one node with no resource acquire trigger and fired it, the job will not be started in StartPreparingJob in QuartzSchedulerThread while loop and other node with enough resource will not acquire trigger

so, why doesn't we rely on MonitorProcessRateLimiter but disable the StartPreparingJob?

May 13 '24 11:05 yhilmare