peerdb icon indicating copy to clipboard operation
peerdb copied to clipboard

BigQuery: adding a column with a list of strings/... leads to an incorrect ALTER TABLE statement

Open yasinzaehringer-paradime opened this issue 1 year ago • 1 comments

The background is this code:

  • https://github.com/PeerDB-io/peerdb/blob/main/flow/connectors/bigquery/bigquery.go#L236-L239
  • https://github.com/PeerDB-io/peerdb/blob/main/flow/connectors/bigquery/qvalue_convert.go#L41-L43

Basically, the ALTER TABLE statement is missing the array type if the new column is a list - it will create a new column of the base type, i.e.: if you add a column of type list of strings, then peerdb will create a column of type string in BigQuery. This leads to avro errors down the line which look like this:

[...]
failed to sync records:
failed to push to avro stage:
failed to write records to local Avro file:
failed to write records to temporary Avro file:
failed to write records to OCF writer:
failed to write record to OCF:
cannot translate datum to binary:
[...]
cannot encode binary record "<affected table name>" field "<affected column name>":
value does not match its schema:
cannot encode binary union:
no member schema types support datum:
allowed types: [null string]; received: map[string]interface {}

Note: don't be deceived by allowed types: [null string]; received: map[string]interface {} - this is a very misleading error message. The value (and not the type) is important here, which is map[array:[]] and this value is correct: we want to write an empty list. But the avro only accepts nullable strings here since the BigQuery schema is incorrect.

@Amogh-Bharadwaj probably something that'd be fixed as part of #1679

serprex avatar May 18 '24 18:05 serprex