Review each unknown_server_error in the transactinal area (tx_gateway_frontend, rm_stm)
unknown_server_error is a fatal error (an app is required to recreate a producer) and we should return it only if it's the only way to handle the situation.
We may make a operation idempotent and retry it until it passes or until it times out (see begin_tx, commit_tx). For tx coordinator fails it may start looking for a new coordinator and redirecting a request along the internal id info (tx_seq) to let it dedupe the request.
Good news, I added chaos tests (so far in a private branch - https://github.com/rystsov/chaos/tree/unknown_server_error) to fail on transient unknown server errors so we have a easy way to reproduce the issue
/backport v22.3.x