[BUG] Getting `coreMQTT: Collision` when starting demo
Describe the bug I was able to create/download the AWS certs, build the firmware, and successfully connect. However, I keep getting this error in the log:
E (4203397) coreMQTT: Collision when adding PacketID=3 at index=0.
I (4203407) sub_pub_unsub_demo: coreMQTT-Agent connected.
I (4203407) temp_sub_pub_and_led_control_demo: coreMQTT-Agent connected.
I (4203417) ota_over_mqtt_demo: coreMQTT-Agent connected. Resuming OTA agent.
I (4203427) core_mqtt_agent_manager: coreMQTT-Agent connected.
E (4203477) network_transport: Connection closed
E (4203477) coreMQTT: Call to receiveSingleIteration failed. Status=MQTTRecvFailed
I (4203477) sub_pub_unsub_demo: coreMQTT-Agent disconnected. Preventing coreMQTT-Agent commands from being enqueued.
I (4203487) temp_sub_pub_and_led_control_demo: coreMQTT-Agent disconnected. Preventing coreMQTT-Agent commands from being enqueued.
I (4203497) ota_over_mqtt_demo: coreMQTT-Agent disconnected. Suspending OTA agent.
I (4203507) core_mqtt_agent_manager: coreMQTT-Agent disconnected.
I (4203517) core_mqtt_agent_manager: TLS connection was disconnected.
I (4204657) core_mqtt_agent_manager: TLS connection established.
I (4205027) coreMQTT: MQTT connection established with the broker.
I (4205027) core_mqtt_agent_manager: Session present: 1
E (4205037) coreMQTT: Collision when adding PacketID=3 at index=0.
I (4205047) sub_pub_unsub_demo: coreMQTT-Agent connected.
I (4205047) temp_sub_pub_and_led_control_demo: coreMQTT-Agent connected.
I (4205057) ota_over_mqtt_demo: coreMQTT-Agent connected. Resuming OTA agent.
I (4205057) core_mqtt_agent_manager: coreMQTT-Agent connected.
W (4205067) sub_pub_unsub_demo: Error or timed out waiting for ack to subscribe message 1. Re-attempting subscribe.
I (4205077) sub_pub_unsub_demo: Task "SubPub0" sending subscribe request to coreMQTT-Agent for topic filter: /filter/SubPub0 with id 1
E (4205107) network_transport: Connection closed
E (4205107) coreMQTT: Call to receiveSingleIteration failed. Status=MQTTRecvFailed
I (4205107) sub_pub_unsub_demo: coreMQTT-Agent disconnected. Preventing coreMQTT-Agent commands from being enqueued.
I (4205127) temp_sub_pub_and_led_control_demo: coreMQTT-Agent disconnected. Preventing coreMQTT-Agent commands from being enqueued.
I (4205137) ota_over_mqtt_demo: coreMQTT-Agent disconnected. Suspending OTA agent.
I (4205147) core_mqtt_agent_manager: coreMQTT-Agent disconnected.
I (4205157) core_mqtt_agent_manager: TLS connection was disconnected.
I (4206107) core_mqtt_agent_manager: TLS connection established.
I (4206357) coreMQTT: MQTT connection established with the broker.
I (4206357) core_mqtt_agent_manager: Session present: 1
System information
- Hardware board: esp32-c3
- IDE used: vscode
- Operating System: MacOS
- Code version: my own fork from the current version
- Project/Demo: mqtt_demo_mutual_auth
Expected behavior I don't expect to see MQTT message collisions in the log
Screenshots or console output See above
Additional context I had some trouble getting the proper policies for the MQTT topics. I ended up using this one:
{
"Version": "2012-10-17",
"Statement": [
{
"Effect": "Allow",
"Action": "iot:Connect",
"Resource": "arn:aws:iot:us-east-1:657592885771:client/iot-reference-esp32-test"
},
{
"Effect": "Allow",
"Action": [
"iot:Publish",
"iot:Receive",
"iot:Subscribe"
],
"Resource": [
"arn:aws:iot:us-east-1:657592885771:topic/filter/*",
"arn:aws:iot:us-east-1:657592885771:topicfilter/filter/*"
]
}
]
}
Hmm. It's possible you may need to tweak some network settings. Collision is most likely caused by either the device not acking a packet from AWS before AWS re-sends it (i.e. AWS sends a packet, device receives packet but sends its ack back too late so AWS resends the packet the device had already received, causing collision) or AWS not sending requested packets back to the device in time before the device retries (i.e. device requests some sort of response, AWS sends the response but device does not receive the response before it retries, device receives the initial response that was sent, and later gets the second response it requested on retry, leading to collision). I will look into how to adjust this project's settings so you can try some things.
Best,
Jason Carroll
Hmm. It's possible you may need to tweak some network settings. Collision is most likely caused by either the device not acking a packet from AWS before AWS re-sends it (i.e. AWS sends a packet, device receives packet but sends its ack back too late so AWS resends the packet the device had already received, causing collision) or AWS not sending requested packets back to the device in time before the device retries (i.e. device requests some sort of response, AWS sends the response but device does not receive the response before it retries, device receives the initial response that was sent, and later gets the second response it requested on retry, leading to collision). I will look into how to adjust this project's settings so you can try some things.
Best,
Jason Carroll
Thanks!
Hi all I got the same kind of error. It's the first time I see it, while using the same wifi connection for almost a year. I see the coreMQTT collision error during OTA download; I did the same test with the same code a few months ago and didn't get any of these errors. Was there any update in the AWS job manager?
here my log:
2024-12-25 19:17:16.497] I (232682) MAIN: Send TOPIC: FLTE20/EWB/out/memcheck MESSAGE: {"time": 232366, "portime": 232, "HeapFree": 61856, "MinHeapFree": 5408, "fwVer": "1.8.0", "rssi": -64} [2024-12-25 19:17:16.566] I (232752) OTA_OVER_MQTT: Sent PUBLISH packet to broker $aws/things/FLTE20/jobs/start-next to broker. [2024-12-25 19:17:16.566] [2024-12-25 19:17:16.567] [2024-12-25 19:17:18.112] E (234302) coreMQTT: Handling of keep alive failed. Status=MQTTKeepAliveTimeout [2024-12-25 19:17:18.113] E (234302) coreMQTT: Call to receiveSingleIteration failed. Status=MQTTKeepAliveTimeout [2024-12-25 19:17:18.113] I (234302) OTA_OVER_MQTT: coreMQTT-Agent disconnected. Suspending OTA agent. [2024-12-25 19:17:18.113] I (234302) core_mqtt_agent_manager: coreMQTT-Agent disconnected. [2024-12-25 19:17:18.113] I (234302) MQTT_UTILS: coreMQTT-Agent disconnected. Preventing coreMQTT-Agent commands from being enqueued. [2024-12-25 19:17:18.133] I (234322) core_mqtt_agent_manager: TLS connection was disconnected. [2024-12-25 19:17:18.492] I (234682) MAIN: Publish MemCheck SUCCESS [2024-12-25 19:17:18.492] I (234682) MAIN: Check for connection [2024-12-25 19:17:21.543] E (237732) esp-tls: [sock=56] select() timeout [2024-12-25 19:17:21.544] E (237732) esp-tls: Failed to open new connection [2024-12-25 19:17:21.852] I (238042) core_mqtt_agent_manager: Retry attempt 1. [2024-12-25 19:17:24.961] I (241142) core_mqtt_agent_manager: TLS connection established. [2024-12-25 19:17:25.412] I (241602) coreMQTT: MQTT connection established with the broker. [2024-12-25 19:17:25.417] I (241602) core_mqtt_agent_manager: Session present: 1 [2024-12-25 19:17:25.417] [2024-12-25 19:17:25.417] E (241602) coreMQTT: Collision when adding PacketID=27 at index=0. [2024-12-25 19:17:25.421] I (241602) OTA_OVER_MQTT: coreMQTT-Agent connected. Resuming OTA agent. [2024-12-25 19:17:25.424] I (241602) core_mqtt_agent_manager: coreMQTT-Agent connected. [2024-12-25 19:17:25.424] I (241602) MQTT_UTILS: coreMQTT-Agent connected. [2024-12-25 19:17:25.754] I (241942) coreMQTT: Ack packet deserialized with result: MQTTSuccess. [2024-12-25 19:17:25.756] I (241942) coreMQTT: State record updated. New state=MQTTPublishDone. [2024-12-25 19:17:25.756] I (241942) MQTT_UTILS: Task "MQTT_APPL" QoS1 Publish 23 (P23:F0). SUCCESS [2024-12-25 19:17:25.756] I (241942) MQTT_UTILS: Send Response into respQueue: PublishMessageId=23, retVal=1 [2024-12-25 19:17:25.853] I (242042) MQTT_UTILS: xMQTTPubishMessagesQueue: send MQTT Message (Qos=1), Topic: FLTE20/EWB/out/memcheck, Message: {"time": 232366, "portime": 232, "HeapFree": 61856, "MinHeapFree": 5408, "fwVer": "1.8.0", "rssi": -64}, timeout: 2000 msecs
Hi @WilliamFrasson,
Hmm... well that gives another data point. I apologize for the delay - I am still looking into this, but am waiting on an ESP32-C3 to come in. So what @WilliamFrasson mentioned (in that it worked a few months ago) - I wonder if it has anything to do with this PR https://github.com/FreeRTOS/iot-reference-esp32/pull/104/files. Could you try altering the values of the timeout and see if that fixes anything?
Best,
Jason Carroll
I was not getting this with esp32c6 devkit yesterday, but now see it when I switched to esp32s3 devkit. Power cycling the devkit, erase/reflash, closing/reopening connection on aws had little effect (see below).
I switched back to esp32c6 and confirm it's still ok. Then back to esp32s3, and now it's ok. So it may be intermittent, I will update if I find the failure mode.
I applied the same aws certs to the esp32s3 as I did the esp32c6, and running main branch as of today [f4c15d5].
I also notice that when it's failing as shown below the iteration count that appears in aws console seems to alternate between a low single digit count and a count offset by about 10430, example, 1, 14030, 2, 10431. This is not consistent, some sessions it seemed to count normally.
I can control the LED but it takes several pub attempts to get through to it when it's in this state..
I (144033) sub_pub_unsub_demo: coreMQTT-Agent connected. I (144043) ota_over_mqtt_demo: coreMQTT-Agent connected. Resuming OTA agent. I (144043) core_mqtt_agent_manager: coreMQTT-Agent connected. I (144053) temp_sub_pub_and_led_control_demo: coreMQTT-Agent connected. E (145363) network_transport: Connection closed E (145363) coreMQTT: Call to receiveSingleIteration failed. Status=MQTTRecvFailed I (145363) sub_pub_unsub_demo: coreMQTT-Agent disconnected. Preventing coreMQTT-Agent commands from being enqueued. I (145373) core_mqtt_agent_manager: TLS connection was disconnected. I (145373) ota_over_mqtt_demo: coreMQTT-Agent disconnected. Suspending OTA agent. I (145393) core_mqtt_agent_manager: coreMQTT-Agent disconnected. I (145393) temp_sub_pub_and_led_control_demo: coreMQTT-Agent disconnected. Preventing coreMQTT-Agent commands from being enqueued. I (146413) core_mqtt_agent_manager: TLS connection established. I (146793) coreMQTT: MQTT connection established with the broker. I (146793) core_mqtt_agent_manager: Session present: 1
I (146793) sub_pub_unsub_demo: coreMQTT-Agent connected. I (146803) ota_over_mqtt_demo: coreMQTT-Agent connected. Resuming OTA agent. I (146803) core_mqtt_agent_manager: coreMQTT-Agent connected. I (146813) temp_sub_pub_and_led_control_demo: coreMQTT-Agent connected. I (146823) temp_sub_pub_and_led_control_demo: Sending publish request to agent with message "{"temperatureSensor":{ "taskName": "TempSubPubLED", "temperatureValue": 0.000000, "iteration": 22}}" on topic "/filter/TempSubPubLED" I (146843) temp_sub_pub_and_led_control_demo: Task TempSubPubLED waiting for publish 22 to complete. I (146973) coreMQTT: Ack packet deserialized with result: MQTTSuccess. I (146973) coreMQTT: State record updated. New state=MQTTPublishDone. I (146983) temp_sub_pub_and_led_control_demo: Rx'ed ack for QoS1 publish from Tx to /filter/TempSubPubLED (P23:F0). I (147013) coreMQTT: De-serialized incoming PUBLISH packet: DeserializerResult=MQTTSuccess. I (147013) coreMQTT: State record updated. New state=MQTTPubAckSend. I (147023) temp_sub_pub_and_led_control_demo: Received incoming publish message {"temperatureSensor":{ "taskName": "TempSubPubLED", "temperatureValue": 0.000000, "iteration": 22}} E (148133) network_transport: Connection closed E (148133) coreMQTT: Call to receiveSingleIteration failed. Status=MQTTRecvFailed I (148133) sub_pub_unsub_demo: coreMQTT-Agent disconnected. Preventing coreMQTT-Agent commands from being enqueued. I (148143) ota_over_mqtt_demo: coreMQTT-Agent disconnected. Suspending OTA agent. I (148153) core_mqtt_agent_manager: coreMQTT-Agent disconnected. I (148153) temp_sub_pub_and_led_control_demo: coreMQTT-Agent disconnected. Preventing coreMQTT-Agent commands from being enqueued. I (148163) core_mqtt_agent_manager: TLS connection was disconnected. I (149203) core_mqtt_agent_manager: TLS connection established. I (149383) coreMQTT: MQTT connection established with the broker. I (149383) core_mqtt_agent_manager: Session present: 1
@orlyprofili When you say iteration count - what exactly are you referring to?
@orlyprofili When you say iteration count - what exactly are you referring to?
In aws console webview, viewing the subscribed temperature message as they came in, there is an iteration counter as part of each temperature message. Normally it increments, but when it was acting up, it patterned something like what I described above.
@orlyprofili I still have not been able to reproduce this error... I will take a deeper look at the code next week... seems to be some bug there with the iteration count ... I wonder if commands are going out of scope to the coreMQTT-Agent (when you make a command to coreMQTT-Agent one must ensure that the memory passed to the command stays in scope - otherwise bad things ensue as memory is corrupted.
@orlyprofili I still have not been able to reproduce this error... I will take a deeper look at the code next week... seems to be some bug there with the iteration count ... I wonder if commands are going out of scope to the coreMQTT-Agent (when you make a command to coreMQTT-Agent one must ensure that the memory passed to the command stays in scope - otherwise bad things ensue as memory is corrupted.
Thanks. I haven't observed it since either. Will share if it comes up again.
@orlyprofili,
Have not had time to look closer into it yet. I am going to try to next week
Best.
@orlyprofili, from inspection I did not see any places where coreMQTT-Agent command contexts went out of scope. I also have not been able to replicate your issue.... definitely seems like it shows up more on the S3. Hmm.. I think I will check task stack settings next... wonder if something is possibly bleeding over from one task to another.
@jasonpcarroll thanks for looking, I've shifted to some other tasks in the past week or so. If I observe this again when I come back to this stuff, I'll post here.
@orlyprofili As discussed above , we are closing this issue for now. If you observe the same issue again , please create a new issue. Thanks.