stackstorm-k8s Reinstalling the new custom pack is not getting rendered on the Web UI

We have uninstalled one of the custom pack, and reinstalled it by using the command st2 pack install (File path of all the actions, rules, workflows).

By running this command, the files were installed on the K8 cluster. However, On the Stackstorm GUI, the workflows, actions are not rendering properly. The pack is still showing the older actions even after uninstalling the pack.

Uninstalled the custom pack:

Reinstalled the same custom pack:

We are not able to see the new actions on the K8 instance Web UI, and the edit parameters, executions options not visible on the Right Hand side of the Actions tab:

Is this a know issue? How to enable the options on the RHS of the Action Tab?

Jun 06 '22 12:06 Bindu-yadav8

@armab Could you please look into this issue?

Jun 07 '22 02:06 Bindu-yadav8

You have not answered the question here: https://github.com/StackStorm/stackstorm-k8s/issues/313#issuecomment-1147514998

This issue and #313 probably have the same cause. To help you we need to know how you have configured packs. In particular what are your values for st2.packs.volumes and st2.packs.images?

Jun 07 '22 02:06 cognifloyd

@cognifloyd @armab We have used Shared Volumes to install the custom packs as per the documentation. https://docs.stackstorm.com/install/k8s_ha.html#custom-st2-packs

One of the API Calls is showing Response 404. We have not used any Custom APIs apart from default StackStorm APIs.

Jun 07 '22 10:06 Bindu-yadav8

[email protected] - Please look into this

Jun 07 '22 10:06 Bindu-yadav8

While trying to run the action from the Custom pack, it is giving error. However, the action from in-built pack "core" is getting executed.

st2 run cnas.trigger_notification is the action from custom pack cnas.

st2 run core.local cmd="echo 'OK'" - is working without any error

Jun 07 '22 12:06 Bindu-yadav8

Again: please show me your values. Feel free to replace any secrets with ****.

Jun 07 '22 15:06 cognifloyd

Looking at the command you are running then you need to be careful about your use of '.

You look like you are trying to send through a parmetr of message with a json string of {'title':'..', ... }. But as your outer quote is the same type as the inner ones then it's going to get all a bit messed up.

You get away with it on the core.local as sending cmd='echo 'OK'' - is actually treated as the concatenaton of strings:

'echo '
OK
''

Hence why in the parameters to the first call, it says the cmd passed was echo OK, rather than echo 'OK'.

So I'd suggest using "" as the outer quotes, so that the single quotes are kept.

Jun 07 '22 15:06 amanda11

Again: please show me your values. Feel free to replace any secrets with ****.

@cognifloyd Here is the vaule.yaml file for your reference below. We have used Shared Volumes for the custom pack. Initially, when we tried to install the custom pack CNAS, the pack and its underlying actions/workflows got registered. However, when we are trying to update the pack, (by uninstalling the previous pack and reinstalling the cnas pack). The files are not getting rendered properly on the UI, it still showing the older files. You can see the snapshots above

Jun 08 '22 07:06 Bindu-yadav8

Looking at the command you are running then you need to be careful about your use of '.

You look like you are trying to send through a parmetr of message with a json string of {'title':'..', ... }. But as your outer quote is the same type as the inner ones then it's going to get all a bit messed up.

You get away with it on the core.local as sending cmd='echo 'OK'' - is actually treated as the concatenaton of strings:

'echo '

OK

''

Hence why in the parameters to the first call, it says the cmd passed was echo OK, rather than echo 'OK'.

So I'd suggest using "" as the outer quotes, so that the single quotes are kept.

@amanda11 Thanks for the suggestion. Yes, I gave the parameter with double quotes and tried executing the action. But it failed.

Jun 08 '22 07:06 Bindu-yadav8

Ok, so you are using persistentVolumeClaims for the packs and virtualenvs volumes. Could you show the values for the virtualenvs volume and the configs volume as well?

Can also share your persistentVolume resource definition? (You would have created this outside the chart)

Jun 08 '22 15:06 cognifloyd

@cognifloyd Here is the values for the virtualenvs volume and the configs volume:

Also, the persistentVolume resource definition outside the chart:

Jun 09 '22 09:06 Bindu-yadav8

Access Mode: RWO is probably the source of your issues.

RWO, or ReadWriteOnce: https://kubernetes.io/docs/concepts/storage/persistent-volumes/#access-modes

the volume can be mounted as read-write by a single node. ReadWriteOnce access mode still can allow multiple pods to access the volume when the pods are running on the same node.

Are all of you pods on the same node? If not, then only the pods that are on the node with the read/write mount will be able to successfully modify the files. Plus, the read/write mount for virtualenvs and packs could be on different nodes, leading to partially successful pack installs.

At a minimum, you need a PVC+PV with ReadWriteMany access mode.

Jun 09 '22 14:06 cognifloyd

Good find @cognifloyd :100:

While https://github.com/StackStorm/stackstorm-k8s#method-2-shared-volumes mentions the use of NFS shares as an example (that provide RWX capabilities under the hood), looks like it would make sense to mention explicitly the ReadWriteMany requirements in the Readme for this type of sharing the pack content.

@Bindu-yadav8 could you please create a PR against the https://github.com/StackStorm/stackstorm-k8s#method-2-shared-volumes and mention that somewhere? It would definitely help others.

Jun 13 '22 12:06 arm4b

@cognifloyd @armab We tried to delete the existing PVC in order to reapply the changes, but the status of PVC is in "terminating" state as shown below:

We also tried to check for volume attachments and tried to patch the PVC considering it a fix for the issue but this didn't work.

MicrosoftTeams-image (13)

Jun 13 '22 12:06 Bindu-yadav8

@Bindu-yadav8 Do you have a storage class backed by the infrastructure that supports ReadWriteMany? Many means its a distributed file-system shared across the nodes. It's not just usual volumes. You need something like Ceph or NFS or similar.

See the table here with different providers: https://kubernetes.io/docs/concepts/storage/persistent-volumes/#access-modes

Jun 13 '22 12:06 arm4b

@armab good catch. Above in the screenshot I see this is AzureDisk. As per the link you have placed, it is not offering the capability. So, appropriate StorageClass will need to be configured for the PVC for this to work.

Jun 13 '22 15:06 arms11

@armab We are using Azure Disk

MicrosoftTeams-image (14)

Jun 14 '22 07:06 Bindu-yadav8

@Bindu-yadav8 as suggested please consider using different storageclass that supports the readwritemany feature.

Perhaps below could help: https://docs.microsoft.com/en-us/azure/aks/azure-files-dynamic-pv

Jun 14 '22 10:06 arms11

@armab @cognifloyd Need your assistance in deleting the existing PVC since its in terminating state from long time now.

MicrosoftTeams-image (13)

Jun 15 '22 06:06 Bindu-yadav8

Hi @armab @cognifloyd,

We have added NFS share for packs, virtual environment and configs. We connected to one of the pods "st2client" and inside the pod, we cloned our custom pack repository. While installing the custom pack we provided the command inside the client pod

stackstorm-ha-st2client-pod- st2 pack install file://custompack

However, we are seeing the below error:

stderr: "st2.actions.python.St2RegisterAction: DEBUG Calling client method "register" with kwargs "{'types': ['all'], 'packs': ['cnas']}" Traceback (most recent call last): File "/opt/stackstorm/st2/lib/python3.8/site-packages/python_runner/python_action_wrapper.py", line 395, in obj.run() File "/opt/stackstorm/st2/lib/python3.8/site-packages/python_runner/python_action_wrapper.py", line 214, in run output = action.run(**self._parameters) File "/opt/stackstorm/packs/packs/actions/pack_mgmt/register.py", line 76, in run result = self._run_client_method( File "/opt/stackstorm/packs/packs/actions/pack_mgmt/register.py", line 155, in _run_client_method result = method(**method_kwargs) File "/opt/stackstorm/st2/lib/python3.8/site-packages/st2client/models/core.py", line 45, in decorate return func(*args, **kwargs) File "/opt/stackstorm/st2/lib/python3.8/site-packages/st2client/models/core.py", line 684, in register self.handle_error(response) File "/opt/stackstorm/st2/lib/python3.8/site-packages/st2client/models/core.py", line 218, in handle_error response.raise_for_status() File "/opt/stackstorm/st2/lib/python3.8/site-packages/requests/models.py", line 943, in raise_for_status raise HTTPError(http_error_msg, response=self) requests.exceptions.HTTPError: 400 Client Error: Bad Request MESSAGE: Failed to register pack "cnas": Pack "/opt/stackstorm/packs/cnas" is missing pack.yaml file for url: http://stackstorm-ha-st2api:9101/v1/packs/register " stdout: ''

Custom pack install error

Even when we place the custom pack in the shared NFS location, and did a POST call to Stackstorm API to install the pack, we are seeing the same error. Changes in custom pack are not reflecting in the Web UI. It is still showing the old installations

API: POST METHOD- https://api.stackstorm.com/api/v1/packs/#/packs_controller.install.post

Sep 15 '22 14:09 Bindu-yadav8

The key piece of that error message is:

MESSAGE: Failed to register pack "cnas": Pack "/opt/stackstorm/packs/cnas" is missing pack.yaml

Is there a pack.yaml file in /opt/stackstorm/packs/cnas?

Sep 15 '22 16:09 cognifloyd

@cognifloyd we had the pack.yaml before issuing the command st2 pack install, after issuing the command we see pack.yaml is not there.

Sep 16 '22 05:09 Kapildev2018

I see you used st2 pack install file://custompack based on a local pack. Do you see the same issue if you install pack from the official stackstorm exchange or maybe using a custom git repository?

Sep 16 '22 09:09 arm4b

I see you used st2 pack install file://custompack based on a local pack. Do you see the same issue if you install pack from the official stackstorm exchange or maybe using a custom git repository?

Yes when I tried to install email pack from stackstorm below error is coming.

2022-09-16 13:36:22,808 ERROR [-] [63247bb3e1a857d29fce672c] Workflow execution completed with errors.
2022-09-16 13:36:22,815 ERROR [-] [63247bb3e1a857d29fce672c] {'type': 'error', 'message': 'Execution failed. See result for details.', 'task_id': 'register_pack', 'result': {'stdout': '', 'stderr': 'st2.actions.python.St2RegisterAction: DEBUG    Calling client method "register" with kwargs "{\'types\': [\'all\'], \'packs\': [\'email\']}"\nTraceback (most recent call last):\n  File "/opt/stackstorm/st2/lib/python3.8/site-packages/python_runner/python_action_wrapper.py", line 395, in <module>\n    obj.run()\n  File "/opt/stackstorm/st2/lib/python3.8/site-packages/python_runner/python_action_wrapper.py", line 214, in run\n    output = action.run(**self._parameters)\n  File "/opt/stackstorm/packs/packs/actions/pack_mgmt/register.py", line 76, in run\n    result = self._run_client_method(\n  File "/opt/stackstorm/packs/packs/actions/pack_mgmt/register.py", line 155, in _run_client_method\n    result = method(**method_kwargs)\n  File "/opt/stackstorm/st2/lib/python3.8/site-packages/st2client/models/core.py", line 45, in decorate\n    return func(*args, **kwargs)\n  File "/opt/stackstorm/st2/lib/python3.8/site-packages/st2client/models/core.py", line 684, in register\n    self.handle_error(response)\n  File "/opt/stackstorm/st2/lib/python3.8/site-packages/st2client/models/core.py", line 218, in handle_error\n    response.raise_for_status()\n  File "/opt/stackstorm/st2/lib/python3.8/site-packages/requests/models.py", line 943, in raise_for_status\n    raise HTTPError(http_error_msg, response=self)\nrequests.exceptions.HTTPError: 400 Client Error: Bad Request\nMESSAGE: Failed to register pack "email": Pack "/opt/stackstorm/packs/email" is missing pack.yaml file for url: http://stackstorm-ha-st2api:9101/v1/packs/register\n', 'exit_code': 1, 'result': 'None'}}

Sep 16 '22 14:09 Kapildev2018

Yeah, same message with missing pack.yaml.

Can you check the content of the /opt/stackstorm/packs/email if pack.yaml is present or not ? Also verify if the contents of that dir are in sync across all the pods like st2api and st2actionrunners.

Sep 19 '22 10:09 arm4b

@armab & @cognifloyd we see default packs in st2api, st2actionrunner & st2 client pods at /opt/stackstorm/packs. But no email or cnas folder there.

We see some log like amqp.exceptions.NotFound: Queue.declare: (404) NOT_FOUND - home node 'rabbit@stackstorm-ha-rabbitmq-0.stackstorm-ha-rabbitmq-headless.hasstorm.svc.cluster.local' of durable queue 'st2.preinit' in vhost '/' is down or inaccessible .

But when we did a curl from client pod to rabitmq endpoint we got rabitmq management page. Please suggest. email_Pack StackstormClient AllPods WebInterface Rabitmq

Sep 20 '22 07:09 Kapildev2018

Check the RabbitMQ logs for any errors.

Also, could you provide more information about your K8s cluster? The version, setup, resources behind the cluster. Is it a bare-metal or cloud environment?

Sep 20 '22 09:09 arm4b

@armab & @cognifloyd, we checked the RabbitMQ logs & cluster status. Cluster status seems running from all the 3 pods, ping & erlang_cookie_sources listed too. But log in the rabbitmq pods has below errors.

Pod Logs=======================

** Last message in was emit_stats
** When Server state == {q,{amqqueue,{resource,<<"/">>,queue,<<"st2.sensor.watch.sensor_container-1640d834e2">>},true,false,none,[],<0.1770.0>,[],[],[],[{vhost,<<"/">>},{name,<<"ha">>},{pattern,<<>>},{'apply-to',<<"all">>},{definition,[{<<"ha-mode">>,<<"all">>},{<<"ha-sync-batch-size">>,10},{<<"ha-sync-mode">>,<<"automatic">>}]},{priority,0}],undefined,[{<0.1773.0>,<0.1770.0>}],[],live,0,[],<<"/">>,#{user => <<"admin">>},rabbit_classic_queue,#{}},none,true,rabbit_mirror_queue_master,{state,{resource,<<"/">>,queue,<<"st2.sensor.watch.sensor_container-1640d834e2">>},<0.1773.0>,<0.1772.0>,rabbit_priority_queue,{passthrough,rabbit_variable_queue,{vqstate,{0,{[],[]}},{0,{[],[]}},{delta,undefined,0,0,undefined},{0,{[],[]}},{0,{[],[]}},0,{0,nil},{0,nil},{0,nil},{qistate,"/bitnami/rabbitmq/mnesia/rabbit@stackstorm-ha-rabbitmq-2.stackstorm-ha-rabbitmq-headless.hasstorm.svc.cluster.local/msg_stores/vhosts/628WB79CIFDYO9LJI6DKMI09L/queues/6OG9XBWUJYQSYMSWOI75W46QV",{#{},[]},undefined,0,32768,#Fun<rabbit_variable_queue.10.100744432>,#Fun<rabbit_variable_queue.11.100744432>,{0,nil},{0,nil},[],[],{resource,<<"/">>,queue,<<"st2.sensor.watch.sensor_container-1640d834e2">>}},{{client_msstate,<0.519.0>,<<103,101,208,145,209,39,162,172,115,108,112,70,114,234,217,68>>,#{},{state,#Ref<0.1326239266.4150919169.134558>,"/bitnami/rabbitmq/mnesia/rabbit@stackstorm-ha-rabbitmq-2.stackstorm-ha-rabbitmq-headless.hasstorm.svc.cluster.local/msg_stores/vhosts/628WB79CIFDYO9LJI6DKMI09L/msg_store_persistent"},rabbit_msg_store_ets_index,"/bitnami/rabbitmq/mnesia/rabbit@stackstorm-ha-rabbitmq-2.stackstorm-ha-rabbitmq-headless.hasstorm.svc.cluster.local/msg_stores/vhosts/628WB79CIFDYO9LJI6DKMI09L/msg_store_persistent",<0.522.0>,#Ref<0.1326239266.4150919169.134559>,#Ref<0.1326239266.4150919169.134555>,#Ref<0.1326239266.4150919169.134560>,#Ref<0.1326239266.4150919169.134561>,{4000,800}},{client_msstate,<0.515.0>,<<63,232,113,194,157,163,163,40,156,37,...>>,...}},...}},...},...}
** Reason for termination ==
** {{badmatch,{error,not_found}},[{rabbit_amqqueue_process,i,2,[{file,"src/rabbit_amqqueue_process.erl"},{line,1151}]},{rabbit_amqqueue_process,'-infos/2-fun-0-',4,[{file,"src/rabbit_amqqueue_process.erl"},{line,1070}]},{lists,foldr,3,[{file,"lists.erl"},{line,1276}]},{rabbit_amqqueue_process,emit_stats,2,[{file,"src/rabbit_amqqueue_process.erl"},{line,1177}]},{rabbit_amqqueue_process,handle_info,2,[{file,"src/rabbit_amqqueue_process.erl"},{line,1683}]},{gen_server2,handle_msg,2,[{file,"src/gen_server2.erl"},{line,1067}]},{proc_lib,wake_up,3,[{file,"proc_lib.erl"},{line,259}]}]}
2022-09-19 10:59:00.978 [error] <0.4526.5> Restarting crashed queue '**st2.sensor.watch.sensor_container**-1640d834e2' in vhost '/'.
2022-09-19 10:59:00.978 [info] <0.1770.0> [{initial_call,{rabbit_prequeue,init,['Argument__1']}},{pid,<0.1770.0>},{registered_name,[]},{error_info,{exit,{{badmatch,{error,not_found}},[{rabbit_amqqueue_process,i,2,[{file,"src/rabbit_amqqueue_process.erl"},{line,1151}]},{rabbit_amqqueue_process,'-infos/2-fun-0-',4,[{file,"src/rabbit_amqqueue_process.erl"},{line,1070}]},{lists,foldr,3,[{file,"lists.erl"},{line,1276}]},{rabbit_amqqueue_process,emit_stats,2,[{file,"src/rabbit_amqqueue_process.erl"},{line,1177}]},{rabbit_amqqueue_process,handle_info,2,[{file,"src/rabbit_amqqueue_process.erl"},{line,1683}]},{gen_server2,handle_msg,2,[{file,"src/gen_server2.erl"},{line,1067}]},{proc_lib,wake_up,3,[{file,"proc_lib.erl"},{line,259}]}]},[{gen_server2,terminate,3,[{file,"src/gen_server2.erl"},{line,1183}]},{proc_lib,wake_up,3,[{file,"proc_lib.erl"},{line,259}]}]}},{ancestors,[<0.1769.0>,<0.526.0>,<0.510.0>,<0.509.0>,rabbit_vhost_sup_sup,rabbit_sup,<0.269.0>]},{message_queue_len,0},{messages,[]},{links,[<0.1772.0>,<0.1769.0>]},{dictionary,[{process_name,{rabbit_amqqueue_process,{resource,<<"/">>,queue,<<"st2.sensor.watch.sensor_container-1640d834e2">>}}},{guid,{{1072198082,2644747048,2619673130,4006597486},1}},{rand_seed,{#{jump => #Fun<rand.3.8986388>,max => 288230376151711743,next => #Fun<rand.2.8986388>,type => exsplus},[14465161831692881|57586813653482372]}},{{ch,<0.1767.0>},{cr,<0.1767.0>,#Ref<0.1326239266.4150788097.144685>,{0,{[],[]}},1,{queue,[],[],0},{qstate,<0.1766.0>,dormant,{0,nil}},0}}]},{trap_exit,true},{status,running},{heap_size,2586},{stack_size,27},{reductions,22278}], [{neighbour,[{pid,<0.1773.0>},{registered_name,[]},{initial_call,{gm,init,['Argument__1']}},{current_function,{gen_server2,process_next_msg,1}},{ancestors,[<0.1772.0>,<0.1770.0>,<0.1769.0>,<0.526.0>,<0.510.0>,<0.509.0>,rabbit_vhost_sup_sup,rabbit_sup,<0.269.0>]},{message_queue_len,0},{links,[<0.1772.0>]},{trap_exit,false},{status,waiting},{heap_size,376},{stack_size,9},{reductions,76511681},{current_stacktrace,[{gen_server2,process_next_msg,1,[{file,"src/gen_server2.erl"},{line,685}]},{proc_lib,init_p_do_apply,3,[{file,"proc_lib.erl"},{line,249}]}]}]},{neighbour,[{pid,<0.1772.0>},{registered_name,[]},{initial_call,{rabbit_mirror_queue_coordinator,init,['Argument__1']}},{current_function,{erlang,hibernate,3}},{ancestors,[<0.1770.0>,<0.1769.0>,<0.526.0>,<0.510.0>,<0.509.0>,rabbit_vhost_sup_sup,rabbit_sup,<0.269.0>]},{message_queue_len,0},{links,[<0.1770.0>,<0.1773.0>]},{trap_exit,false},{status,waiting},{heap_size,240},{stack_size,0},{reductions,339},{current_stacktrace,[]}]}]
2022-09-19 10:59:00.978 [error] <0.1770.0> **CRASH REPORT Process** <0.1770.0> with 2 neighbours exited with reason: no match of right hand value {error,not_found} in rabbit_amqqueue_process:i/2 line 1151 in gen_server2:terminate/3 line 1183
2022-09-19 10:59:00.979 [error] <0.1769.0> Supervisor {<0.1769.0>,rabbit_amqqueue_sup} had child rabbit_amqqueue started with rabbit_prequeue:start_link({amqqueue,{resource,<<"/">>,queue,<<"st2.sensor.watch.sensor_container-1640d834e2">>},true,false,...}, declare, <0.1768.0>) at <0.1770.0> exit with reason no match of right hand value {error,not_found} in rabbit_amqqueue_process:i/2 line 1151 in context child_terminated
2022-09-19 10:59:00.979 [error] <0.1761.0> Error on AMQP connection <0.1761.0> (10.244.1.20:45060 -> 10.244.0.39:5672, vhost: '/', user: 'admin', state: running), channel 1:
 {{{{badmatch,{error,not_found}},
   [{rabbit_amqqueue_process,i,2,
                             [{file,"src/rabbit_amqqueue_process.erl"},
                              {line,1151}]},
    {rabbit_amqqueue_process,'-infos/2-fun-0-',4,
                             [{file,"src/rabbit_amqqueue_process.erl"},
                              {line,1070}]},
    {lists,foldr,3,[{file,"lists.erl"},{line,1276}]},
    {rabbit_amqqueue_process,emit_stats,2,
                             [{file,"src/rabbit_amqqueue_process.erl"},
                              {line,1177}]},
    {rabbit_amqqueue_process,handle_info,2,
                             [{file,"src/rabbit_amqqueue_process.erl"},
                              {line,1683}]},
    {gen_server2,handle_msg,2,[{file,"src/gen_server2.erl"},{line,1067}]},
    {proc_lib,wake_up,3,[{file,"proc_lib.erl"},{line,259}]}]},
  {gen_server2,call,
               [<0.1770.0>,
                {basic_cancel,<0.1767.0>,<<"None1">>,
                              {'basic.cancel_ok',<<"None1">>},
                              <<"admin">>},
                infinity]}},
 [{gen_server2,call,3,[{file,"src/gen_server2.erl"},{line,346}]},
  {rabbit_amqqueue,basic_cancel,6,
                   [{file,"src/rabbit_amqqueue.erl"},{line,1770}]},
  {rabbit_misc,with_exit_handler,2,[{file,"src/rabbit_misc.erl"},{line,532}]},
  {rabbit_channel,handle_method,3,
                  [{file,"src/rabbit_channel.erl"},{line,1541}]},
  {rabbit_channel,handle_cast,2,[{file,"src/rabbit_channel.erl"},{line,630}]},
  {gen_server2,handle_msg,2,[{file,"src/gen_server2.erl"},{line,1067}]},
  {proc_lib,wake_up,3,[{file,"proc_lib.erl"},{line,259}]}]}
2022-09-19 10:59:00.979 [error] <0.1767.0> ** Generic server <0.1767.0> terminating

Cluster Status========================================
$ rabbitmq-diagnostics check_running
Checking if RabbitMQ is running on node rabbit@stackstorm-ha-rabbitmq-2.stackstorm-ha-rabbitmq-headless.hasstorm.svc.cluster.local ...
RabbitMQ on node rabbit@stackstorm-ha-rabbitmq-2.stackstorm-ha-rabbitmq-headless.hasstorm.svc.cluster.local is fully booted and running
$ rabbitmq-diagnostics ping
Will ping rabbit@stackstorm-ha-rabbitmq-2.stackstorm-ha-rabbitmq-headless.hasstorm.svc.cluster.local. This only checks if the OS process is running and registered with epmd. Timeout: 60000 ms.
Ping succeeded
$ rabbitmqctl cluster_status
Cluster status of node rabbit@stackstorm-ha-rabbitmq-2.stackstorm-ha-rabbitmq-headless.hasstorm.svc.cluster.local ...
Basics

Cluster name: rabbit@stackstorm-ha-rabbitmq-0.stackstorm-ha-rabbitmq-headless.hasstorm.svc.cluster.local

Disk Nodes

rabbit@stackstorm-ha-rabbitmq-0.stackstorm-ha-rabbitmq-headless.hasstorm.svc.cluster.local
rabbit@stackstorm-ha-rabbitmq-1.stackstorm-ha-rabbitmq-headless.hasstorm.svc.cluster.local
rabbit@stackstorm-ha-rabbitmq-2.stackstorm-ha-rabbitmq-headless.hasstorm.svc.cluster.local

Running Nodes

rabbit@stackstorm-ha-rabbitmq-0.stackstorm-ha-rabbitmq-headless.hasstorm.svc.cluster.local
rabbit@stackstorm-ha-rabbitmq-1.stackstorm-ha-rabbitmq-headless.hasstorm.svc.cluster.local
rabbit@stackstorm-ha-rabbitmq-2.stackstorm-ha-rabbitmq-headless.hasstorm.svc.cluster.local

Versions

rabbit@stackstorm-ha-rabbitmq-0.stackstorm-ha-rabbitmq-headless.hasstorm.svc.cluster.local: RabbitMQ 3.8.9 on Erlang 22.3
rabbit@stackstorm-ha-rabbitmq-1.stackstorm-ha-rabbitmq-headless.hasstorm.svc.cluster.local: RabbitMQ 3.8.9 on Erlang 22.3
rabbit@stackstorm-ha-rabbitmq-2.stackstorm-ha-rabbitmq-headless.hasstorm.svc.cluster.local: RabbitMQ 3.8.9 on Erlang 22.3

Maintenance status

Node: rabbit@stackstorm-ha-rabbitmq-0.stackstorm-ha-rabbitmq-headless.hasstorm.svc.cluster.local, status: not under maintenance
Node: rabbit@stackstorm-ha-rabbitmq-1.stackstorm-ha-rabbitmq-headless.hasstorm.svc.cluster.local, status: not under maintenance
Node: rabbit@stackstorm-ha-rabbitmq-2.stackstorm-ha-rabbitmq-headless.hasstorm.svc.cluster.local, status: not under maintenance

Alarms

(none)

Network Partitions

(none)

Listeners

Node: rabbit@stackstorm-ha-rabbitmq-0.stackstorm-ha-rabbitmq-headless.hasstorm.svc.cluster.local, interface: [::], port: 15672, protocol: http, purpose: HTTP API
Node: rabbit@stackstorm-ha-rabbitmq-0.stackstorm-ha-rabbitmq-headless.hasstorm.svc.cluster.local, interface: [::], port: 25672, protocol: clustering, purpose: inter-node and CLI tool communication
Node: rabbit@stackstorm-ha-rabbitmq-0.stackstorm-ha-rabbitmq-headless.hasstorm.svc.cluster.local, interface: [::], port: 5672, protocol: amqp, purpose: AMQP 0-9-1 and AMQP 1.0
Node: rabbit@stackstorm-ha-rabbitmq-1.stackstorm-ha-rabbitmq-headless.hasstorm.svc.cluster.local, interface: [::], port: 15672, protocol: http, purpose: HTTP API
Node: rabbit@stackstorm-ha-rabbitmq-1.stackstorm-ha-rabbitmq-headless.hasstorm.svc.cluster.local, interface: [::], port: 25672, protocol: clustering, purpose: inter-node and CLI tool communication
Node: rabbit@stackstorm-ha-rabbitmq-1.stackstorm-ha-rabbitmq-headless.hasstorm.svc.cluster.local, interface: [::], port: 5672, protocol: amqp, purpose: AMQP 0-9-1 and AMQP 1.0
Node: rabbit@stackstorm-ha-rabbitmq-2.stackstorm-ha-rabbitmq-headless.hasstorm.svc.cluster.local, interface: [::], port: 15672, protocol: http, purpose: HTTP API
Node: rabbit@stackstorm-ha-rabbitmq-2.stackstorm-ha-rabbitmq-headless.hasstorm.svc.cluster.local, interface: [::], port: 25672, protocol: clustering, purpose: inter-node and CLI tool communication
Node: rabbit@stackstorm-ha-rabbitmq-2.stackstorm-ha-rabbitmq-headless.hasstorm.svc.cluster.local, interface: [::], port: 5672, protocol: amqp, purpose: AMQP 0-9-1 and AMQP 1.0

Feature flags

Flag: drop_unroutable_metric, state: disabled
Flag: empty_basic_get_metric, state: disabled
Flag: implicit_default_bindings, state: enabled
Flag: maintenance_mode_status, state: enabled
Flag: quorum_queue, state: enabled
Flag: virtual_host_metadata, state: enabled
$ rabbitmq-diagnostics erlang_cookie_sources
Listing Erlang cookie sources used by CLI tools...
Cookie File

Effective user: (none)
Effective home directory: /opt/bitnami/rabbitmq/.rabbitmq
Cookie file path: /opt/bitnami/rabbitmq/.rabbitmq/.erlang.cookie
Cookie file exists? true
Cookie file type: regular
Cookie file access: read
Cookie file size: 41

Cookie CLI Switch

--erlang-cookie value set? false
--erlang-cookie value length: 0

Env variable  (Deprecated)

RABBITMQ_ERLANG_COOKIE value set? false
RABBITMQ_ERLANG_COOKIE value length: 0

We are using K8s cluster in Azure & kubernetes version is 1.22.6 with 6 nodes of Ubuntu 18.04.6 LTS & kernel version 5.4.0-1089-azure.

Please suggest.

Sep 22 '22 12:09 Kapildev2018

Can you share your rabbitmq helm values?

Sep 22 '22 15:09 cognifloyd

Can you share your rabbitmq helm values?

@cognifloyd & @armab ,

We have the following helm values or configurations for Rabbitmq. We had the following values commented. Did uncomment & upgraded the stackstorm deployment. Still same issue pack installation is not progressing.

## RabbitMQ configuration (3rd party chart dependency)
##
## For values.yaml reference:
## https://github.com/bitnami/charts/tree/master/bitnami/rabbitmq
##
rabbitmq:
  # Change to `false` to disable in-cluster rabbitmq deployment.
  # Specify your external [messaging] connection parameters under st2.config
  enabled: true
  clustering:
    # On unclean cluster restarts forceBoot is required to cleanup Mnesia tables (see: https://github.com/helm/charts/issues/13485)
    # Use it only if you prefer availability over integrity.
    forceBoot: true
  # Authentication Details
  auth:
    username: admin
    # TODO: Use default random 10 character password, but need to fetch this string for use by downstream services
    password: 9jS+w1u07NbHtZke1m+jW4Cj
    # Up to 255 character string, should be fixed so that re-deploying the chart does not fail (see: https://github.com/helm/charts/issues/12371)
    # NB! It's highly recommended to change the default insecure rabbitmqErlangCookie value!
    erlangCookie: 8MrqQdCQ6AQ8U3MacSubHE5RqkSfvNaRHzvxuFcG
  # RabbitMQ Memory high watermark. See: http://www.rabbitmq.com/memory.html
  # Default values might not be enough for StackStorm deployment to work properly. We recommend to adjust these settings for you needs as well as enable Pod memory limits via "resources".
  rabbitmqMemoryHighWatermark: 512MB
  rabbitmqMemoryHighWatermarkType: absolute
  persistence:
    enabled: true
  # Enable Queue Mirroring between nodes
  # See https://www.rabbitmq.com/ha.html
  # This code block is commented out waiting for
  # https://github.com/bitnami/charts/issues/4635
  loadDefinition:
    enabled: true
    existingSecret: "{{ .Release.Name }}-rabbitmq-definitions"
  extraConfiguration: |
    load_definitions = /app/rabbitmq-definitions.json
    # We recommend to set the memory limit for RabbitMQ-HA Pods in production deployments.
    # Make sure to also change the rabbitmqMemoryHighWatermark following the formula:
    rabbitmqMemoryHighWatermark = 0.4 * resources.limits.memory
  resources: {}
  # number of replicas in the rabbit cluster
  replicaCount: 3
  # As RabbitMQ enabled prometheus operator monitoring by default, disable it for non-prometheus users
  metrics:
    enabled: false

========================================================= Some logs of stackstorm

2022-09-25 05:18:12,846 INFO [-] 99e478fd-ee37-4b06-9aec-d91772c95dec - GET / with query={} (method='GET',path='/',remote_addr='10.244.4.4',query={},request_id='99e478fd-ee37-4b06-9aec-d91772c95dec')
2022-09-25 05:18:12,847 INFO [-] 99e478fd-ee37-4b06-9aec-d91772c95dec - 404 50 0.635ms (method='GET',path='/',remote_addr='10.244.4.4',status=404,runtime=0.635,content_length=50,request_id='99e478fd-ee37-4b06-9aec-d91772c95dec')
2022-09-25 05:25:48,426 INFO [-] 3fd71e07-480e-42b6-8806-e9c92fe42e5e - GET / with query={} (method='GET',path='/',remote_addr='10.244.4.4',query={},request_id='3fd71e07-480e-42b6-8806-e9c92fe42e5e')
2022-09-25 05:25:48,426 INFO [-] 3fd71e07-480e-42b6-8806-e9c92fe42e5e - 404 50 0.579ms (method='GET',path='/',remote_addr='10.244.4.4',status=404,runtime=0.579,content_length=50,request_id='3fd71e07-480e-42b6-8806-e9c92fe42e5e')
2022-09-25 05:37:48,587 INFO [-] ad61af77-a62e-4ee4-9844-a09dfdc4268a - GET / with query={} (method='GET',path='/',remote_addr='10.244.4.4',query={},request_id='ad61af77-a62e-4ee4-9844-a09dfdc4268a')
2022-09-25 05:37:48,588 INFO [-] ad61af77-a62e-4ee4-9844-a09dfdc4268a - **404** 50 0.531ms (method='GET',path='/',remote_addr='10.244.4.4',status=404,runtime=0.531,content_length=50,request_id='ad61af77-a62e-4ee4-9844-a09dfdc4268a')

2022-09-26 05:17:02,834 INFO [-] Sleeping for 600 seconds before next garbage collection...
  File "/opt/stackstorm/st2/lib/python3.8/site-packages/amqp/connection.py", line 535, in on_inbound_method
    return self.channels[channel_id].dispatch_method(
  File "/opt/stackstorm/st2/lib/python3.8/site-packages/amqp/abstract_channel.py", line 143, in dispatch_method
    listener(*args)
  File "/opt/stackstorm/st2/lib/python3.8/site-packages/amqp/channel.py", line 277, in _on_close
    raise error_for_code(
amqp.exceptions.NotFound: Queue.declare: (404) NOT_FOUND - home node 'rabbit@stackstorm-ha-rabbitmq-0.stackstorm-ha-rabbitmq-headless.hasstorm.svc.cluster.local' of durable queue 'st2.preinit' in vhost '/' is down or inaccessible
2

2022-09-23 11:56:33,564 INFO [-] The status of action execution is changed from requested to scheduled. <LiveAction.id=632d9ef128e8de20488a169e, ActionExecution.id=632d9ef128e8de20488a169f>
    return self.channels[channel_id].dispatch_method(
  File "/opt/stackstorm/st2/lib/python3.8/site-packages/amqp/abstract_channel.py", line 143, in dispatch_method
    listener(*args)
  File "/opt/stackstorm/st2/lib/python3.8/site-packages/amqp/channel.py", line 277, in _on_close
    raise error_for_code(
amqp.exceptions.NotFound: Queue.declare: (404) NOT_FOUND - home node 'rabbit@stackstorm-ha-rabbitmq-0.stackstorm-ha-rabbitmq-headless.hasstorm.svc.cluster.local' of durable queue **'st2.preinit' in vhost '/' is down or inaccessible**
2

Sep 26 '22 05:09 Kapildev2018