Stackstorm jobs are stuck and not running
SUMMARY
We have an issue where Stackstorm jobs are getting stuck and platform gets into hung state when there are 400+ alerts received in the same time interval. Issue: Alerts getting Stuck in Scheduled or Delay state Volume : 400+ alerts Pattern: To run each alert sequentially
STACKSTORM VERSION
st2 --version: st2 3.7.0, on Python 3.8.10
OS, environment, install method
ST2: Docker in our CaaS Container. Kubernetes HA. System Requirement is aligned with St2 doc
Steps to reproduce the problem
We face this issue as in when we receive same type of alerts for the same host and check value for which correlation is maintained. In case we receive such multiple alerts at same minute, the 1st alert start running and rest all alerts are getting hung state. We have a volume of 400+ alerts coming to Stackstorm and requires execution.
Expected Results
Alerts should not get stuck and to run as expected by the product
Actual Results
Alerts are stuck and then requires manual cleaning and handling
Need immediate support.
Thanks!