Adding `if_exists="override" Mode
Is your feature request related to a problem? Please describe.
is there mode to do truncate load for existing table if_exists="override", it could be useful to avoid dropping table
Describe the solution you'd like being able to override data in existing table.
Can you share a bit more about your use case? You'd like to replace the data, but keep the table intact? I suppose this is related to your other request about preserving clustering/partitioning?
correct i like to replace the data, but keep the table intact. dropping table always has risk of losing table metadata, and could cause failing queries for the time between drop and create. override is safer operation.
What if we call it "truncate" to match what the API is actually doing? If/when we support non-load jobs, we can run a DML "truncate" operation first via SQL.
Edit: On second thought, the "truncate" write disposition can ignore the existing schema. I think this safer replace requires a DML drop rows, followed by an append rows job, even without using streaming.
could cause failing queries for the time between drop and create
To avoid these problems, a single "truncate" load job would work, but does risk changing the schema accidentally. I think some combination of writing to a staging table, then a SQL merge statement to delete existing rows and replace with rows from the staging table would be required to do this atomically without changing the table metadata.
could cause failing queries for the time between drop and create
To avoid these problems, a single "truncate" load job would work, but does risk changing the schema accidentally. I think some combination of writing to a staging table, then a SQL merge statement to delete existing rows and replace with rows from the staging table would be required to do this atomically without changing the table metadata.
Could such an implementation also be considered for the replace mode? The time between "drop" and "create" can be significant for larger tables and results in the failure of downstream queries.
Could the replace mode be re-implemented with a more atomic solution, i.e.:
- Write to a staging location
- Drop the existing table
- Rename the staging location to the target location
What would be the downsides?
I like the idea here, but based on current workloads and priorities, going to close this issue as will not fix.