dataform icon indicating copy to clipboard operation
dataform copied to clipboard

Attempting to re-create datasets in Bigquery in another region leads to confusing errors

Open lewish opened this issue 6 years ago • 1 comments

In BigQuery, when starting a run, the first thing we do is attempt to create a dataset in the default location provided when connecting to a warehouse. If a dataset with this name already exists but in another region, the call will silently fail and not create the dataset.

Any attempts to create tables or views in that dataset will subsequently fail with an error message like: Dataset [my_dataset] does not exist in location EU

When we attempt to create the dataset, and it already exists in a region different from the one specified in the data warehouse connection profile - we should throw an appropriate error in prepareDataset: https://github.com/dataform-co/dataform/blob/master/api/dbadapters/bigquery.ts#L120

Something like Cannot create dataset my_dataset in location EU as it already exists in location US/another location. Change your default dataset location or delete the exist dataset.

lewish avatar Mar 06 '19 16:03 lewish

Indeed. I have a Firebase project in US region which extracts data into BigQuery daily (same US location). My server events historically go to europe-west-2 region tables. Attempt to run any query on tables in US region in my dataform configured for europe-west-2 will result in: Error: Not found: Dataset my-data:analytics_152444 was not found in location europe-west2.

mshakhomirov avatar Aug 05 '21 17:08 mshakhomirov

We no longer silently fail with this, we loudly fail with the error that BigQuery gives us; bigquery error: Not found: Dataset cloud-dataform-testing:dataform_core_testing was not found in location EU:

image

The error message isn't great, but I don't think we should change this:

  • We're returning the literal error that BigQuery gives us when attempting to recreate a schema (https://github.com/dataform-co/dataform/blob/c0d1a7400f4aed74f90e032149561a3027df0ea4/cli/api/commands/run.ts#L436C60-L436C70) - it is their API error which should be changed to make it more clear.
  • Attempting to hack around the returned error message will likely create other errors, where we say it's due to a location error when it's not.
  • It's not too difficult to deduce why schema creation failed in these situations.

So closing this now as works as intended.

Ekrekr avatar Mar 26 '24 11:03 Ekrekr