Fix for deadlock issue #3682
What does this PR do?
This PR is a fix for #3682. In some instances, mem.Alerts.Subscribe() and store.gc() can get deadlocked
- Lock acquisition in
store.Alerts.gc():- The method
store.Alerts.gc()acquires a lock on its internal mutexstore.Alerts.mtx - While holding the lock, it then calls the callback function which will try to acquire a lock on
mem.Alerts.mtx
- The method
- Concurrent Execution:
-
mem.Alerts.Subscribe()acquires a lock on its internal mutexmem.Alerts.mtxand callsstore.Alerts.List()
-
- Deadlock situation:
- Callback function tries to acquire a lock on
mem.Alerts.mtx. However this lock is already being held bymem.Subscribe() - Similarly
mem.Subscribe()cannot proceed becausestore.List()cannot acquire lock (store.Alerts.mtx) because it is being held bystore.gc()
- Callback function tries to acquire a lock on
Another way of summarizing this is store.Alerts.gc() was holding the lock until callback function completed which in turn was waiting to acquire the lock. Callback function could not acquire the lock because Subscribe() was holding the lock. Subscribe() cannot progress because it calls store.Alerts.List() which was waiting for lock acquisition which was being held by store.Alerts.gc(). This fix releases the lock held by store.Alerts.gc() prior to calling the callback function
@rajagopalanand thank you very much for your contribution - can you please fix the linter?
This is a duplicate of https://github.com/prometheus/alertmanager/pull/3648/files, right?
This is a duplicate of https://github.com/prometheus/alertmanager/pull/3648/files, right?
Was not aware of this but I will take a look
This is a duplicate of https://github.com/prometheus/alertmanager/pull/3648/files, right?
Was not aware of this but I will take a look
The other PR addresses issues more than just the deadlock issue mentioned in #3682. If this can be reviewed and merged, then when the other PR gets reviewed and merged it can address race conditions and can retain a test from this PR