[TRI-967] Some JobRunExecutions are getting stuck in "STARTED" state even though the graphile job has an error
The graphile job shows an error of Response: 404, but the JobRunExecution is stuck in STARTED state, even though it should have errored out and done the same to the run. This needs to be investigated.
@ericallam When the server fails to connect with the endpoint, we throw an error to the graphile worker job so that it handles the retry with the maxAttempts parameter. If the server isn't able to establish a connection after all retry attempts, the graphile worker would mark the job as failed, and we don't update the status of the job run (set to STARTED initially).
https://github.com/triggerdotdev/trigger.dev/blob/d1ecd6b99cb491640a2d8f24579a436737104cb0/apps/webapp/app/services/runs/performRunExecutionV2.server.ts#L260-L262
https://github.com/triggerdotdev/trigger.dev/blob/d1ecd6b99cb491640a2d8f24579a436737104cb0/apps/webapp/app/services/runs/performRunExecutionV2.server.ts#L772-L774
When #337 is done, we can handle the retry as part of the run execution instead of letting the graphile worker do it for us.