dataform icon indicating copy to clipboard operation
dataform copied to clipboard

BigQuery job schedules intermittently timeout

Open benwatson528 opened this issue 4 years ago • 2 comments

Hello, I run a large schedule twice daily on BigQuery (via a schedule configured in Dataform Web) that contains 340 jobs and growing (230 datasets, 100 assertions, 10 operations).

The schedule takes about 10 minutes when successful, but about twice a week it times out after 90 minutes. This appears to be because it loses track of the status of some jobs in BQ. One or more jobs will appear as still running in Dataform (I have seen as many as four jobs hung in this state).

The jobs will complete successfully (as observed in the BQ query history), but Dataform still thinks they're running, and does so until the timeout limit is reached. The rest of the schedule doesn't run, and PagerDuty doesn't trigger like it would in the event of a normal assertion/syntax error failure. It seems like a tracking thread is getting lost somewhere.

Any help greatly appreciated.

benwatson528 avatar Jan 28 '22 08:01 benwatson528

For those who are having the same issue, I talked to the support and they said it's a known issue and it'll be fixed only in the next version of Dataform. For now, you have to retry your job.

giyama avatar Jul 11 '22 13:07 giyama

Same issue here, our project contains 20 .js jobs (2500 datasets).

felipemellodito avatar Jul 11 '22 15:07 felipemellodito