rocketmq [Bug] Master broker in Dledger mode didn't sync consume offset in time after reboot, which caused consume repeatedly

Before Creating the Bug Report

[X] I found a bug, not just asking a question, which should be created in GitHub Discussions.
[X] I have searched the GitHub Issues and GitHub Discussions of this repository and believe that this is not a duplicate.
[X] I have confirmed that this bug belongs to the current repository, not other repositories of RocketMQ.

Runtime platform environment

CentOS Linux release 7.3.1611

RocketMQ version

4.8.0

JDK Version

1.8

Describe the Bug

In dledger mode, master broker which was shut down and reboot would not synchornize consume offset with the previous temporary master broker. The problem described may caused repeatedly consumption. After reading code, one bug may exist in the synchornize logic: BrokerController.java: The oneway flag is set true, which means that following code in BrokerOuterAPI@registerBrokerAll will return null without waiting for the result: which caused the following marked code didn't run actually: Masteraddr which is null leads to a result that SlaveSynchronize@syncAll didnt't run either, including function body of SlaveSynchronize@syncConsumerOffset: The Synchronization task was set to run every 10 seconds , but before next sync happens, the reboot broker was elected to be master again already, which means it stopped to try to get the newest consume offset, but start to push messsages from the offset before. 未命名文件

Steps to Reproduce

Run a cluster in dledger mode.
Shut down master broker gracefully.
Reboot previous master broker, be sure that consume offset increases during downtime.
Repeat consumption should be observed.

What Did You Expect to See?

Rebooted broker should synchornize the newest consume offset and consume from it correctly.

What Did You See Instead?

Abnormal repeated consumption.

Additional Context

No response

Jan 29 '24 04:01 bxfjb

Can you submit a pr to fix it @bxfjb

Mar 01 '24 03:03 cserwen

Can you submit a pr to fix it @bxfjb

Here is the pr, writing unittest is more complicate than I thought :) https://github.com/apache/rocketmq/pull/7901

Mar 12 '24 08:03 bxfjb

Since the consumption progress is synchronized every 10 seconds, if this mechanism does not change, in theory, repeated consumption will occur as long as a master-slave switch occurs. A synchronization mechanism may be needed to ensure that the consumption progress is synchronized.

Mar 22 '24 02:03 LittleBoy18

Since the consumption progress is synchronized every 10 seconds, if this mechanism does not change, in theory, repeated consumption will occur as long as a master-slave switch occurs. A synchronization mechanism may be needed to ensure that the consumption progress is synchronized.

Actually the consumption synchronization period is modified to 3s in latest code, so that the repeated consumption you mentioned should decrease a lot. To reduce the remaining repeat, #7901 now explicitly call syncAll after register broker infomation.

Mar 22 '24 03:03 bxfjb

when the master switches，We also encountered the same problem of repeated consumption，Is there other method to avoid this issue before patch become merged

Apr 12 '24 04:04 lsy1990

when the master switches，We also encountered the same problem of repeated consumption，Is there other method to avoid this issue before patch become merged

Apr 12 '24 05:04 lsy1990

when the master switches，We also encountered the same problem of repeated consumption，Is there other method to avoid this issue before patch become merged

Apr 12 '24 05:04 lsy1990

when the master switches，We also encountered the same problem of repeated consumption，Is there other method to avoid this issue before patch become merged

Nothing usable yet, you may merge the patch to your personal repository

Apr 12 '24 06:04 bxfjb