st2 icon indicating copy to clipboard operation
st2 copied to clipboard

Stackstorm jobs are stuck and not running

Open nehadubx opened this issue 2 years ago • 0 comments

SUMMARY

We have an issue where Stackstorm jobs are getting stuck and platform gets into hung state when there are 400+ alerts received in the same time interval. Issue: Alerts getting Stuck in Scheduled or Delay state Volume : 400+ alerts Pattern: To run each alert sequentially

STACKSTORM VERSION

st2 --version: st2 3.7.0, on Python 3.8.10

OS, environment, install method

ST2: Docker in our CaaS Container. Kubernetes HA. System Requirement is aligned with St2 doc

Steps to reproduce the problem

We face this issue as in when we receive same type of alerts for the same host and check value for which correlation is maintained. In case we receive such multiple alerts at same minute, the 1st alert start running and rest all alerts are getting hung state. We have a volume of 400+ alerts coming to Stackstorm and requires execution.

Expected Results

Alerts should not get stuck and to run as expected by the product

Actual Results

Alerts are stuck and then requires manual cleaning and handling

Need immediate support.

Thanks!

nehadubx avatar Aug 10 '23 06:08 nehadubx