foundationdb icon indicating copy to clipboard operation
foundationdb copied to clipboard

Better explanation of recovery stall due to policy

Open ajbeamon opened this issue 5 years ago • 0 comments

We had a report that someone configure a multi-region cluster with satellites and had the cluster fail to recover after excluding the satellite processes. This is the status reported by fdbcli:

db> status details
Using cluster file `fdb-primary.cluster’.
Recruiting new transaction servers.
Need at least 2 log servers across unique zones, 1 commit proxies, 1 GRV
proxies and 1 resolvers.

Have 8 non-excluded processes on 8 machines across 8 zones.

This error message is implying that we don't have the processes required to meet our policy (which is correct), but from the data provided it would appear that things should work. This error message should be made more clear to explain what is missing.

See https://forums.foundationdb.org/t/how-to-recover-fdb-database-from-attempt-of-excluding-the-single-sattellite-node/2381.

ajbeamon avatar Oct 09 '20 15:10 ajbeamon