DELETE syntax for MSQ
Description
Currently, MSQ supports INSERT (for adding data) and REPLACE (for replacing rows). REPLACE, however, has the potential to also drop segments, by replacing it with an empty set. This works by reingesting the entire data except for the part to be deleted based on some condition.
This is a proposal to add a new DELETE syntax, that would allow users to more easily specify what they want to do without having to understand the concept of reindexing. Internally, this would translate to a REPLACE query, with the inverted condition of the DELETE, so that all rows except the ones in the DELETE are reingested. This means that not many changes would be needed in the MSQ part of Druid.
Example:
DELETE FROM stats
WHERE country = 'New Zealand'
PARTITIONED BY MONTH
CLUSTERED BY city
would be translated to
REPLACE INTO stats OVERWRITE ALL
SELECT * FROM stats
WHERE NOT(country = 'New Zealand')
PARTITIONED BY MONTH
CLUSTERED BY city
Syntax
DELETE FROM "table name"
WHERE "condition"
PARTITIONED BY "partitioning"
CLUSTERED BY "clustering"
This is similar is structure to a DELETE query from SQL, with the addition of partitioning and clustering. For a DELETE, ideally it should not be required to define these. However, since internally, we reindex the table it is required. If there is a mechanism to get the partitioning/clustering of a datasource while parsing, it would be possible to make those parameters optional, making the query simpler:
DELETE FROM "table name"
WHERE "condition"
PR
There is one PR, which is still a work in progress.
https://github.com/apache/druid/pull/13674
Can I work on this issue ??
@imSanko Sure! There was some work initially done as a part of the PR https://github.com/apache/druid/pull/13674, however, that was put on hold due to some related work that might change how the feature looks. LMK in case you need help to proceed further.
This issue has been marked as stale due to 280 days of inactivity. It will be closed in 4 weeks if no further activity occurs. If this issue is still relevant, please simply write any comment. Even if closed, you can still revive the issue at any time or discuss it on the [email protected] list. Thank you for your contributions.
This issue has been closed due to lack of activity. If you think that is incorrect, or the issue requires additional review, you can revive the issue at any time.