BigQuery job schedules intermittently timeout
Hello, I run a large schedule twice daily on BigQuery (via a schedule configured in Dataform Web) that contains 340 jobs and growing (230 datasets, 100 assertions, 10 operations).
The schedule takes about 10 minutes when successful, but about twice a week it times out after 90 minutes. This appears to be because it loses track of the status of some jobs in BQ. One or more jobs will appear as still running in Dataform (I have seen as many as four jobs hung in this state).
The jobs will complete successfully (as observed in the BQ query history), but Dataform still thinks they're running, and does so until the timeout limit is reached. The rest of the schedule doesn't run, and PagerDuty doesn't trigger like it would in the event of a normal assertion/syntax error failure. It seems like a tracking thread is getting lost somewhere.
Any help greatly appreciated.
For those who are having the same issue, I talked to the support and they said it's a known issue and it'll be fixed only in the next version of Dataform. For now, you have to retry your job.
Same issue here, our project contains 20 .js jobs (2500 datasets).