[Bug] Master broker in Dledger mode didn't sync consume offset in time after reboot, which caused consume repeatedly
Before Creating the Bug Report
-
[X] I found a bug, not just asking a question, which should be created in GitHub Discussions.
-
[X] I have searched the GitHub Issues and GitHub Discussions of this repository and believe that this is not a duplicate.
-
[X] I have confirmed that this bug belongs to the current repository, not other repositories of RocketMQ.
Runtime platform environment
CentOS Linux release 7.3.1611
RocketMQ version
4.8.0
JDK Version
1.8
Describe the Bug
In dledger mode, master broker which was shut down and reboot would not synchornize consume offset with the previous temporary master broker. The problem described may caused repeatedly consumption.
After reading code, one bug may exist in the synchornize logic:
BrokerController.java:
The oneway flag is set true, which means that following code in BrokerOuterAPI@registerBrokerAll will return null without waiting for the result:
which caused the following marked code didn't run actually:
Masteraddr which is null leads to a result that SlaveSynchronize@syncAll didnt't run either, including function body of SlaveSynchronize@syncConsumerOffset:
The Synchronization task was set to run every 10 seconds , but before next sync happens, the reboot broker was elected to be master again already, which means it stopped to try to get the newest consume offset, but start to push messsages from the offset before.
Steps to Reproduce
- Run a cluster in dledger mode.
- Shut down master broker gracefully.
- Reboot previous master broker, be sure that consume offset increases during downtime.
- Repeat consumption should be observed.
What Did You Expect to See?
Rebooted broker should synchornize the newest consume offset and consume from it correctly.
What Did You See Instead?
Abnormal repeated consumption.
Additional Context
No response
Can you submit a pr to fix it @bxfjb
Can you submit a pr to fix it @bxfjb
Here is the pr, writing unittest is more complicate than I thought :) https://github.com/apache/rocketmq/pull/7901
Since the consumption progress is synchronized every 10 seconds, if this mechanism does not change, in theory, repeated consumption will occur as long as a master-slave switch occurs. A synchronization mechanism may be needed to ensure that the consumption progress is synchronized.
Since the consumption progress is synchronized every 10 seconds, if this mechanism does not change, in theory, repeated consumption will occur as long as a master-slave switch occurs. A synchronization mechanism may be needed to ensure that the consumption progress is synchronized.
Actually the consumption synchronization period is modified to 3s in latest code, so that the repeated consumption you mentioned should decrease a lot. To reduce the remaining repeat, #7901 now explicitly call syncAll after register broker infomation.
when the master switches,We also encountered the same problem of repeated consumption,Is there other method to avoid this issue before patch become merged
when the master switches,We also encountered the same problem of repeated consumption,Is there other method to avoid this issue before patch become merged
when the master switches,We also encountered the same problem of repeated consumption,Is there other method to avoid this issue before patch become merged
when the master switches,We also encountered the same problem of repeated consumption,Is there other method to avoid this issue before patch become merged
Nothing usable yet, you may merge the patch to your personal repository