lock hang at cut-over table stage
We met a lock hang at cut-over table stage recently. After analysis, I have a suspicion that it is related to the PR #888. The problem occurs handle Timeout while waiting for events up to lock in atomicCutOver(). After timeout error current atomicCutOver will be cancelled and try again. When canceled, defer func is executed, include okToUnlockTable <-true and this.applier.DropAtomicCutOverSentryTableIfExists(), also applier.AtomicCutOverMagicLock will drop magic cut-over table after recv okToUnlockTable channel. so the PR use sync.Once to avoid to send drop cutover sentry table to mysql twice, But if the drop table operation executed by applier.DropAtomicCutOverSentryTableIfExists() first, it will be locked with Waiting for table metadata lock, while the actual lock owner applier.AtomicCutOverMagicLock is stuck here (Once mutex lock), Waiting for the completion of the former.
This can be reproduced by injecting some problems,at here force a timeout error before waiting for the event to be locked, and Wait a few seconds here to make sure that the delete table is invoking by DropAtomicCutOverSentryTableIfExists().
Thank you!
@cenkore Hello,is it possible to move the delete table action to the defer function of AtomicCutOverMagicLock?This enables the creation and release of sentry table in the same coroutine, and avoid this problem.