rocketmq icon indicating copy to clipboard operation
rocketmq copied to clipboard

what will happen if metadata is different between Name Servers in one name server cluster?

Open jason199101 opened this issue 3 years ago • 0 comments

Here is the situation. there is a name server cluster including ns1 and ns2. broker cluster including b1, b2, b3. if ns1 is diconnected with b3, and ns2 is disconnected with b2, which means metadata in ns1 only include b1, b2 and ns2 only have b1, b3.

now if producer connected to ns1, producer will only send msg to b1 and b2. if consumer connected to ns2, then consumer will only consume msg from b1 and b3. which means all msgs send to b2 will not be consumed. is that correct?

if this is correct, what should we do to prevent it happen?

jason199101 avatar Aug 10 '22 08:08 jason199101

This will make the cluster in an inconsistent state. Publishing of messages would be not load-balanced and subscribers may temporarily ignore some messages on the node that is not present in its route table. Some other undefined behavior may also be possible. This is pretty bad because this breaks the design assumption that brokers connect and report to all name-server nodes.

To avoid this from happening, it's advisable to build a monitor tool to compare the routing table of each topic periodically. Once a name server node fails, add an alternative one and isolate it from the cluster.

It may also be viable to introduce the consensus algorithm to name server nodes to further improve resilience.

lizhanhui avatar Aug 11 '22 03:08 lizhanhui

as I know, metadata in namesrv include kv config and topic route info that producers and consumers use for send and consume messages rely on. image

how producers use topic route info when sending message: https://github.com/apache/rocketmq/blob/e5a71bb95f6b8c1dcb4e44d5948469629da3833b/client/src/main/java/org/apache/rocketmq/client/impl/producer/DefaultMQProducerImpl.java#L550

for consumers, you try to find it.

francisoliverlee avatar Aug 11 '22 16:08 francisoliverlee

This will make the cluster in an inconsistent state. Publishing of messages would be not load-balanced and subscribers may temporarily ignore some messages on the node that is not present in its route table. Some other undefined behavior may also be possible. This is pretty bad because this breaks the design assumption that brokers connect and report to all name-server nodes.

To avoid this from happening, it's advisable to build a monitor tool to compare the routing table of each topic periodically. Once a name server node fails, add an alternative one and isolate it from the cluster.

It may also be viable to introduce the consensus algorithm to name server nodes to further improve resilience.

gotta, thanks for your explanation :)

jason199101 avatar Aug 16 '22 06:08 jason199101

as I know, metadata in namesrv include kv config and topic route info that producers and consumers use for send and consume messages rely on. image

how producers use topic route info when sending message:

https://github.com/apache/rocketmq/blob/e5a71bb95f6b8c1dcb4e44d5948469629da3833b/client/src/main/java/org/apache/rocketmq/client/impl/producer/DefaultMQProducerImpl.java#L550

for consumers, you try to find it.

that's right, both consumer and producer will get the router info from name servers. However, just as my question described, if consumer and producer getting router info from different name server instances (in one name server cluster). which means consumer and producer may have different router info.

jason199101 avatar Aug 16 '22 06:08 jason199101