python-bigquery-pandas Adding `if_exists="override" Mode

Is your feature request related to a problem? Please describe. is there mode to do truncate load for existing table if_exists="override", it could be useful to avoid dropping table

Describe the solution you'd like being able to override data in existing table.

Sep 29 '21 12:09 ismailsimsek

Can you share a bit more about your use case? You'd like to replace the data, but keep the table intact? I suppose this is related to your other request about preserving clustering/partitioning?

Sep 29 '21 22:09 tswast

correct i like to replace the data, but keep the table intact. dropping table always has risk of losing table metadata, and could cause failing queries for the time between drop and create. override is safer operation.

Sep 30 '21 06:09 ismailsimsek

What if we call it "truncate" to match what the API is actually doing? If/when we support non-load jobs, we can run a DML "truncate" operation first via SQL.

Edit: On second thought, the "truncate" write disposition can ignore the existing schema. I think this safer replace requires a DML drop rows, followed by an append rows job, even without using streaming.

Sep 30 '21 17:09 tswast

could cause failing queries for the time between drop and create

To avoid these problems, a single "truncate" load job would work, but does risk changing the schema accidentally. I think some combination of writing to a staging table, then a SQL merge statement to delete existing rows and replace with rows from the staging table would be required to do this atomically without changing the table metadata.

Dec 23 '21 18:12 tswast

could cause failing queries for the time between drop and create

To avoid these problems, a single "truncate" load job would work, but does risk changing the schema accidentally. I think some combination of writing to a staging table, then a SQL merge statement to delete existing rows and replace with rows from the staging table would be required to do this atomically without changing the table metadata.

Could such an implementation also be considered for the replace mode? The time between "drop" and "create" can be significant for larger tables and results in the failure of downstream queries.

Could the replace mode be re-implemented with a more atomic solution, i.e.:

Write to a staging location
Drop the existing table
Rename the staging location to the target location

What would be the downsides?

Jun 27 '22 09:06 jkukul

I like the idea here, but based on current workloads and priorities, going to close this issue as will not fix.

Jun 08 '23 15:06 chalmerlowe