Add is_healthy() check into up(). Now migrations will fail if patch_clusterwide is running
If clusterwide config has not applied yet after cluster start migration will fail. This is a common scenario in tests.
See
code: 32
message: AtomicCallError: cartridge.patch_clusterwide is already running
stack traceback:
/app/.rocks/share/tarantool/cartridge/twophase.lua:583: in function 'config_patch_clusterwide'
/app/.rocks/share/tarantool/migrator.lua:105: in function 'up'
eval:1: in main chunk
[C]: at 0x006163c0
Maybe we can introduce is_healthy() check or a timeout into up() and wait until config is applied ?
Now i need to use such workaraound:
if (!container.isRunning()) {
container.start();
}
boolean healthy = false;
int attempts = 30;
while(!healthy && attempts-- >0) {
List<?> result = container.executeCommand("return require('cartridge').is_healthy()").get();
log.info("Checking cluster healthy status: {}, {}", result.size(), result.toString());
if(result.size()==1) {
healthy = (Boolean)result.get(0);
}
Thread.sleep(1000);
}
if(!healthy) {
throw new RuntimeException("Failed to get cluster in healthy state");
}
container.executeCommand("require('migrator').up()").get();
Waiting until clusterwide config will be applied or until a startup_timeout will reached in up() looks meaningful for me. This startup_timeout should be applied only for waiting cluster bootstrapping: not to the migration itself. (The name of the option is to be discussed. Maybe bootstrap_timeout.)
We should also add the startup_timeout (or how we'll name it) query parameter into the /migrations/up HTTP endpoint.
I'll include it into our backlog without a deadline. Reach me if you want to raise a priority here.