Azure Event Hub Integration doesn't reconnect with connection loss
I'm facing connection loss with the Azure Event Hub Integration. (Thingsboard PE) This issue often occures over night (maybe update mechanisms or stuff)
- no more data arrives in Thingsboard (also no outgoing data is visible at the event hub)
- However, incoming data is still visible at the event hub => the problem is not the device software or EventHub and thingsboard suddenly loosing connection.
As workaround I can click in Integration -> myAzureEventHub -> Debug = true and then back to Debug = false, and the data flow works again.
I have to do that for each Integration in every Tenant. Please help! I'm loosing data
I downloaded some logs from the cluster:
- Error starts at time 03:33:49,673-03:33:49,722 (thingsboard_01_07_22.log)
- then no more incoming data
- Error ends at time 06:06:00 (tb-node-0-server_01_07_22.log) because the integration is set to Debug = true and then you can see the logs for the EventHub
Here the logs: tb-node-0-server_01_07_22.log
IoT Event Hub Chart
Maybe it is a good idea to enable Debug for the integration to see more details. Any messages in AzureEventHub?
In your logs I can't see any related. Only Timeouts, but why is to check.
Hi Backdraft007, thank you for your fast answer! We activated now Debug for integration and we are waiting for the next abortion of the connection.
In AzureEventHub I cannot see anything, do you have a hint where to look?
I added the log file where the error started at time 03:33:49,673-03:33:49,722 (thingsboard_01_07_22.log). Maybe this helps?
Hi together, we now saw the error many times and it is coming consequently after few hours/days.
Start of error: the start of the data loss is at 02:22:08,658 Between no data is received from EventHub but the data is there available. End of error: I disabled Debug and enabled it again around 09:18 and in the log you can see this change of the Lifecycle at 09:18:55,718. Afterwards the data is received again.
Can see anything in the logs?
2022-09-02 02:22:07,972 [tb-rule-engine-consumer-59-thread-11 | QK(Main,TB_RULE_ENGINE,system)-8] INFO o.t.s.s.q.DefaultTbRuleEngineConsumerService - [84488a40-721c-11ec-a7ee-e9b15f6548a6] Failed to process message: TbMsg(queueName=Main, id=9263d5c6-e238-4c6f-87ca-9c0a9e7c3f22, ts=1662085327616, type=POST_TELEMETRY_REQUEST, originator=7fc5a830-b0fc-11ec-887f-436f81b68bfb, customerId=null, metaData=TbMsgMetaData(data={deviceType=TX9, deviceName=172.20.16.142, ts=1662085321000}), dataType=JSON, data={"requester/stateMessage":"63: \u0027set\u0027 object has no attribute \u0027items\u0027"}, ruleChainId=892f35a0-af5b-11ec-887f-436f81b68bfb, ruleNodeId=6bd9eaf0-caf3-11ec-a771-d1504450a03b, ctx=org.thingsboard.server.common.msg.TbMsgProcessingCtx@7e080ad9, callback=org.thingsboard.server.common.msg.queue.TbMsgCallback$1@6ea82529), Last Rule Node: [RuleChain: Tx9 Rule Chain|RuleNode: Rf Warning/Fail Script(6bd9eaf0-caf3-11ec-a771-d1504450a03b)]
Check this rule chain and the rule node. Enable in the rule node the debug and check what went wrong.
Hi backdraft, thank you for the hint, in the rule chain was a mistake and we fixed it. So far the error hasn't occurred again. Do you think that such a error in the rule chain can affect the EventHub integration?
I think so.
ok but in my opinion this should not occur, therefore I would label this issue as bug?
I think it is not a bug. When a rule chain/node produce an error, it is committed to the oroginator and tells that there is an error. So the integration tells it to Azure that Azure knows there is an error.
Yes that's correct but afterwards the integration with Azure stopped working even though correct data would arrive. That is not optimal I think, but the mistake is on our side.
Yes and no. :) Better to stop a service when it throws errors instead of wrong data or crash another service. Just my opinion.
Yes you are correct but the service "Azure EventHub" is not the reason of the error, it only ensures that data is flowing to Thingsboard it is not responsible for the format of the data.
One last question, do you think that one error in a rule chain affects only the Integration of the tenant where the error occurs or the Azure integrations of all tenants?
I cannot test it in the moment, but I think only the tenant of the integration is affected.
Yes I saw the error on another tenant today which also has an mistake in the rule chain, but the other tenants weren't affected.
That sounds good. I think you can close this issue. :)
Yes thank you very much for your help!
Hi Backdraft007, unfortunately, I realized that the connection breaks down even without errors in the rule chain. I have attached a log file again, which shows the error at time 2022-09-28 03:44:05,826. Do you have any idea of the reason of the error?
@Backdraft007 Do you have any idea?
Btw thank you very much for your help so far.
Greetings
I cannot find an error at your timestamp. But at 2022-09-28 04:02:10,907 there is an error.
Hi Backdraft007, thank you for your answer!
The timestamp which i wrote was the time since no data was coming from EventHub: 2022-09-28 03:44:05,826 [reactor-executor-24] INFO c.a.m.e.i.ManagementChannel - Management endpoint state: CLOSED
This line indicates that something is wrong with the integration and it will try to reopen a new AMQP connection to the EventHub: 2022-09-28 03:44:05,924 [reactor-executor-24] INFO c.a.m.e.i.AmqpReceiveLinkProcessor - Receive link endpoint states are closed. Requesting another.
But it seems that reconnection was not successful and afterwards the connection is broken: 2022-09-28 03:44:05,926 [reactor-executor-24] INFO c.a.m.e.i.AmqpReceiveLinkProcessor - Error on receive link null
Yes you are correct that at the timestamp 2022-09-28 04:02:10,907 a rule engine error occurs but at this time the connection is already broken. That's why I concluded that there is no connection between the rule engine error and the connection problem to the EventHub. Would you agree with that?
Is there any solution to that issue?
Hi @mikepetersyn ! Do you have the same problem on Thingsboard PE?
Hi,
Integrations in Thingsboard are only available with the PE version, so yes the problem occurs with Thingsboard PE.
Best regards, Daniel
From: AndriichenkoDm @.> Sent: Thursday, February 29, 2024 2:51 PM To: thingsboard/thingsboard @.> Cc: Witsch Daniel (8BA2) @.>; Manual @.> Subject: EXT [Newsletter] Re: [thingsboard/thingsboard] Azure Event Hub Integration doesn't reconnect with connection loss (Issue #6994)
Hi @mikepetersynhttps://github.com/mikepetersyn ! Do you have the same problem on Thingsboard PE?
— Reply to this email directly, view it on GitHubhttps://github.com/thingsboard/thingsboard/issues/6994#issuecomment-1971187971, or unsubscribehttps://github.com/notifications/unsubscribe-auth/AVOLZ3I7UCOJCMQJCETSTKDYV4Y2VAVCNFSM54WGXQ4KU5DIOJSWCZC7NNSXTN2JONZXKZKDN5WW2ZLOOQ5TCOJXGEYTQNZZG4YQ. You are receiving this because you are subscribed to this thread.Message ID: @.@.>>