more flexible job manager end state
With the new internal queue; jobs are automatically retried incase more jobs are created that the amount of allowed parallel jobs.
Since the job manager runs until all jobs end in finalized, start failed or error, it doe snot support the internal queueing.
Ideally we would build in some flexibility that allows the user to submit and track more parallel jobs than those supported with their standard account. Can we make the 'end condition' on start_failed more flexible while not risking an endless loop?
Since the job manager runs until all jobs end in finalized, start failed or error, it does not support the internal queueing.
I'm not sure I understand what you mean. The "internal queuing" feature is just an internal backend thing by design, I don't think there is anything required client-side to support that.
Something that might be possible however, is to have a standard API to discover and leverage job submission limits as discussed at
- https://github.com/Open-EO/openeo-api/issues/559
Will create a minimal example to reproduce the issue
narrowed down the issue;
It comes from the try except loop in the PR: https://github.com/Open-EO/openeo-python-client/pull/736
`def execute(self) -> _TaskResult: """ Executes the job start process using the OpenEO connection.
Authenticates if a bearer token is provided, retrieves the job by ID,
and attempts to start it.
:returns:
A `_TaskResult` with status and statistics metadata, indicating
success or failure of the job start.
"""
try:
conn = openeo.connect(self.root_url)
if self.bearer_token:
conn.authenticate_bearer_token(self.bearer_token)
job = conn.job(self.job_id)
job.start()
_log.info(f"Job {self.job_id} started successfully")
return _TaskResult(
job_id=self.job_id,
db_update={"status": "queued"},
stats_update={"job start": 1},
)
except Exception as e:
_log.error(f"Failed to start job {self.job_id}: {e}")
return _TaskResult(
job_id=self.job_id,
db_update={"status": "start_failed"},
stats_update={"start_job error": 1})`
Failed to start job j-2504220752104722b90406957695f315: [429] Too Many Requests
--> We need to avoid labeling too many request errors as start_failed and instead handle those jobs as 'created'
related:
- https://github.com/Open-EO/openeo-python-client/issues/764