rocketmq icon indicating copy to clipboard operation
rocketmq copied to clipboard

When the broker master fails,return timeout,Client will not retry.

Open Cczzzz opened this issue 3 years ago • 6 comments

BUG REPORT

  1. Please describe the issue you observed:
  • What did you do (The steps to reproduce)?

  • While sending, kill master

  • What is expected to see?

  • The client does not report an error

  • What did you see instead?

  • send error, timeout

When the broker master fails,Clients cannot choose to send to other broker masters。 When the broker master fails,return timeout,Client will not retry。 This is because the first occurrence of the client consumes all the timeout。when return timeout ,The client will consume all timeouts.No time left to retry

DefaultMQProducerImpl# 607 sendResult = this.sendKernelImpl(msg, mq, communicationMode, sendCallback, topicPublishInfo, timeout - costTime);

Cczzzz avatar Jul 29 '22 02:07 Cczzzz

BUG REPORT

  1. Please describe the issue you observed:
  • What did you do (The steps to reproduce)?
  • While sending, kill master
  • What is expected to see?
  • The client does not report an error
  • What did you see instead?
  • send error, timeout

When the broker master fails,Clients cannot choose to send to other broker masters。 When the broker master fails,return timeout,Client will not retry。 This is because the first occurrence of the client consumes all the timeout。when return timeout ,The client will consume all timeouts.No time left to retry

DefaultMQProducerImpl# 607 sendResult = this.sendKernelImpl(msg, mq, communicationMode, sendCallback, topicPublishInfo, timeout - costTime);

I dont think this is a bug, in sync mode, the argument of timout is provided by the application layer, total retry timeout consume the timeout together. In async mode, only retry once.

SeaItFover avatar Jul 31 '22 02:07 SeaItFover

It will cause the client to fail when the master goes down.is this reasonable?I always thought the client was insensitive

Cczzzz avatar Aug 02 '22 08:08 Cczzzz

When I set the timeout to 10s, it still cannot be retried correctly. 2022-08-02 16:22:05,917 WARN RocketmqClient - sendKernelImpl exception, resend at once, InvokeID: -2111425297163537192, RT: 10015ms, Broker: MessageQueue [topic=CONNECT, brokerName=broker-c, queueId=0]

Cczzzz avatar Aug 02 '22 08:08 Cczzzz

https://github.com/apache/rocketmq/pull/3555 @Cczzzz Verify that this PR solves this issue

duhenglucky avatar Aug 02 '22 09:08 duhenglucky

#3555 @Cczzzz Verify that this PR solves this issue

@duhenglucky no,is different

Cczzzz avatar Aug 04 '22 06:08 Cczzzz

image This is because the first send exceeds the set timeout, so it will not continue to retry. I think this restriction should be removed and the timeout should be recalculated when retrying

panzhi33 avatar Aug 04 '22 07:08 panzhi33

Finally, this issue is raised again.

IMO, the retry strategy on the client side needs significant refinement. I suggest creating a RIP to improve this. We need something similar to what gRPC-client-retry strategy https://github.com/grpc/proposal/blob/master/A6-client-retries.md

lizhanhui avatar Aug 11 '22 03:08 lizhanhui

This issue is stale because it has been open for 365 days with no activity. It will be closed in 3 days if no further activity occurs.

github-actions[bot] avatar Aug 12 '23 00:08 github-actions[bot]

This issue was closed because it has been inactive for 3 days since being marked as stale.

github-actions[bot] avatar Aug 15 '23 00:08 github-actions[bot]