sql,cloud: cloud SQL shell can crash a node
It's possible to crash a node in a CC cluster through the SQL shell by attempting to run COMMIT or EXECUTE. This happens because requests through the cloud SQL shell run through an internal executor, which does not support committing the transaction, so a panic results. A crash occurs for similar reasons after SHOW COMMIT TIMESTAMP. There may be other statements that are incompatible with the internal executor as well.
The following is a screenshot I took of a graph of kubernetes node restarts after running COMMIT and ROLLBACK through the shell on a CC dedicated test cluster:
We probably need to set up a list of disallowed statements for the cloud shell. For reference, we recently introduced a crdb_internal.execute_internally builtin function that has to do something similar: https://github.com/cockroachdb/cockroach/blob/5d90eb7f6d58aa23882c3dd7e5649cabc064b3e4/pkg/sql/sem/builtins/generator_builtins.go#L3595-L3602
However, we may want to relax the restriction somewhat, since it probably prohibits some safe statements.
Jira issue: CRDB-38983
Hi @DrewKimball, please add branch-* labels to identify which branch(es) this C-bug affects.
:owl: Hoot! I am a Blathers, a bot for CockroachDB. My owner is dev-inf.
We already add some context to any error returned by the internal executor in the run-query-via-api interface. We should consider also adding a panic-catcher with some logging in case we run into another assertion. We should make sure to log this to stderr in particular.
We might also consider turning all panics into explicitly returned errors, although we'd have to make sure this is safe.
[quoting @DrewKimball during postmortem] steps to reproduce:
- start cloud dedicated cluster (serverless would probably also work, but we don't have observability into killing a node in a serverless cluster)
- open the cloud SQL console
- send either
COMMIT;orROLLBACK;orSHOW TRANSACTION TIMESTAMP;