thingsboard icon indicating copy to clipboard operation
thingsboard copied to clipboard

Azure Event Hub Integration doesn't reconnect with connection loss

Open Gubischubi opened this issue 3 years ago • 2 comments

I'm facing connection loss with the Azure Event Hub Integration. (Thingsboard PE) This issue often occures over night (maybe update mechanisms or stuff)

  • no more data arrives in Thingsboard (also no outgoing data is visible at the event hub)
  • However, incoming data is still visible at the event hub => the problem is not the device software or EventHub and thingsboard suddenly loosing connection.

As workaround I can click in Integration -> myAzureEventHub -> Debug = true and then back to Debug = false, and the data flow works again.

I have to do that for each Integration in every Tenant. Please help! I'm loosing data

I downloaded some logs from the cluster:

  • Error starts at time 03:33:49,673-03:33:49,722 (thingsboard_01_07_22.log)
  • then no more incoming data
  • Error ends at time 06:06:00 (tb-node-0-server_01_07_22.log) because the integration is set to Debug = true and then you can see the logs for the EventHub

Here the logs: tb-node-0-server_01_07_22.log

IoT Event Hub Chart Integration_Problem_01_07_22

Gubischubi avatar Jul 26 '22 14:07 Gubischubi

Maybe it is a good idea to enable Debug for the integration to see more details. Any messages in AzureEventHub?

In your logs I can't see any related. Only Timeouts, but why is to check.

Backdraft007 avatar Jul 27 '22 05:07 Backdraft007

Hi Backdraft007, thank you for your fast answer! We activated now Debug for integration and we are waiting for the next abortion of the connection.

In AzureEventHub I cannot see anything, do you have a hint where to look?

I added the log file where the error started at time 03:33:49,673-03:33:49,722 (thingsboard_01_07_22.log). Maybe this helps?

thingsboard_0107_22.log

DanielWi92 avatar Aug 02 '22 10:08 DanielWi92

Hi together, we now saw the error many times and it is coming consequently after few hours/days.

Start of error: the start of the data loss is at 02:22:08,658 Between no data is received from EventHub but the data is there available. End of error: I disabled Debug and enabled it again around 09:18 and in the log you can see this change of the Lifecycle at 09:18:55,718. Afterwards the data is received again.

Can see anything in the logs?

thingsboard_02_09_22.log

DanielWi92 avatar Sep 02 '22 10:09 DanielWi92

2022-09-02 02:22:07,972 [tb-rule-engine-consumer-59-thread-11 | QK(Main,TB_RULE_ENGINE,system)-8] INFO o.t.s.s.q.DefaultTbRuleEngineConsumerService - [84488a40-721c-11ec-a7ee-e9b15f6548a6] Failed to process message: TbMsg(queueName=Main, id=9263d5c6-e238-4c6f-87ca-9c0a9e7c3f22, ts=1662085327616, type=POST_TELEMETRY_REQUEST, originator=7fc5a830-b0fc-11ec-887f-436f81b68bfb, customerId=null, metaData=TbMsgMetaData(data={deviceType=TX9, deviceName=172.20.16.142, ts=1662085321000}), dataType=JSON, data={"requester/stateMessage":"63: \u0027set\u0027 object has no attribute \u0027items\u0027"}, ruleChainId=892f35a0-af5b-11ec-887f-436f81b68bfb, ruleNodeId=6bd9eaf0-caf3-11ec-a771-d1504450a03b, ctx=org.thingsboard.server.common.msg.TbMsgProcessingCtx@7e080ad9, callback=org.thingsboard.server.common.msg.queue.TbMsgCallback$1@6ea82529), Last Rule Node: [RuleChain: Tx9 Rule Chain|RuleNode: Rf Warning/Fail Script(6bd9eaf0-caf3-11ec-a771-d1504450a03b)]

Check this rule chain and the rule node. Enable in the rule node the debug and check what went wrong.

Backdraft007 avatar Sep 02 '22 11:09 Backdraft007

Hi backdraft, thank you for the hint, in the rule chain was a mistake and we fixed it. So far the error hasn't occurred again. Do you think that such a error in the rule chain can affect the EventHub integration?

DanielWi92 avatar Sep 05 '22 07:09 DanielWi92

I think so.

Backdraft007 avatar Sep 05 '22 07:09 Backdraft007

ok but in my opinion this should not occur, therefore I would label this issue as bug?

DanielWi92 avatar Sep 05 '22 07:09 DanielWi92

I think it is not a bug. When a rule chain/node produce an error, it is committed to the oroginator and tells that there is an error. So the integration tells it to Azure that Azure knows there is an error.

Backdraft007 avatar Sep 05 '22 08:09 Backdraft007

Yes that's correct but afterwards the integration with Azure stopped working even though correct data would arrive. That is not optimal I think, but the mistake is on our side.

DanielWi92 avatar Sep 05 '22 10:09 DanielWi92

Yes and no. :) Better to stop a service when it throws errors instead of wrong data or crash another service. Just my opinion.

Backdraft007 avatar Sep 05 '22 10:09 Backdraft007

Yes you are correct but the service "Azure EventHub" is not the reason of the error, it only ensures that data is flowing to Thingsboard it is not responsible for the format of the data.

One last question, do you think that one error in a rule chain affects only the Integration of the tenant where the error occurs or the Azure integrations of all tenants?

DanielWi92 avatar Sep 05 '22 10:09 DanielWi92

I cannot test it in the moment, but I think only the tenant of the integration is affected.

Backdraft007 avatar Sep 05 '22 13:09 Backdraft007

Yes I saw the error on another tenant today which also has an mistake in the rule chain, but the other tenants weren't affected.

DanielWi92 avatar Sep 06 '22 07:09 DanielWi92

That sounds good. I think you can close this issue. :)

Backdraft007 avatar Sep 06 '22 10:09 Backdraft007

Yes thank you very much for your help!

DanielWi92 avatar Sep 06 '22 12:09 DanielWi92

Hi Backdraft007, unfortunately, I realized that the connection breaks down even without errors in the rule chain. I have attached a log file again, which shows the error at time 2022-09-28 03:44:05,826. Do you have any idea of the reason of the error?

tb-node-0-server_28_09_22.log

DanielWi92 avatar Sep 28 '22 09:09 DanielWi92

@Backdraft007 Do you have any idea?

Btw thank you very much for your help so far.

Greetings

Gubischubi avatar Nov 10 '22 10:11 Gubischubi

I cannot find an error at your timestamp. But at 2022-09-28 04:02:10,907 there is an error.

Backdraft007 avatar Nov 10 '22 10:11 Backdraft007

Hi Backdraft007, thank you for your answer!

The timestamp which i wrote was the time since no data was coming from EventHub: 2022-09-28 03:44:05,826 [reactor-executor-24] INFO c.a.m.e.i.ManagementChannel - Management endpoint state: CLOSED

This line indicates that something is wrong with the integration and it will try to reopen a new AMQP connection to the EventHub: 2022-09-28 03:44:05,924 [reactor-executor-24] INFO c.a.m.e.i.AmqpReceiveLinkProcessor - Receive link endpoint states are closed. Requesting another.

But it seems that reconnection was not successful and afterwards the connection is broken: 2022-09-28 03:44:05,926 [reactor-executor-24] INFO c.a.m.e.i.AmqpReceiveLinkProcessor - Error on receive link null

Yes you are correct that at the timestamp 2022-09-28 04:02:10,907 a rule engine error occurs but at this time the connection is already broken. That's why I concluded that there is no connection between the rule engine error and the connection problem to the EventHub. Would you agree with that?

DanielWi92 avatar Nov 10 '22 12:11 DanielWi92

Is there any solution to that issue?

mikepetersyn avatar Feb 29 '24 12:02 mikepetersyn

Hi @mikepetersyn ! Do you have the same problem on Thingsboard PE?

AndriichnekoDm avatar Feb 29 '24 13:02 AndriichnekoDm

Hi,

Integrations in Thingsboard are only available with the PE version, so yes the problem occurs with Thingsboard PE.

Best regards, Daniel

From: AndriichenkoDm @.> Sent: Thursday, February 29, 2024 2:51 PM To: thingsboard/thingsboard @.> Cc: Witsch Daniel (8BA2) @.>; Manual @.> Subject: EXT [Newsletter] Re: [thingsboard/thingsboard] Azure Event Hub Integration doesn't reconnect with connection loss (Issue #6994)

Hi @mikepetersynhttps://github.com/mikepetersyn ! Do you have the same problem on Thingsboard PE?

— Reply to this email directly, view it on GitHubhttps://github.com/thingsboard/thingsboard/issues/6994#issuecomment-1971187971, or unsubscribehttps://github.com/notifications/unsubscribe-auth/AVOLZ3I7UCOJCMQJCETSTKDYV4Y2VAVCNFSM54WGXQ4KU5DIOJSWCZC7NNSXTN2JONZXKZKDN5WW2ZLOOQ5TCOJXGEYTQNZZG4YQ. You are receiving this because you are subscribed to this thread.Message ID: @.@.>>

DanielWi92 avatar Mar 06 '24 17:03 DanielWi92