rocketmq When the broker master fails，return timeout，Client will not retry.

BUG REPORT

Please describe the issue you observed:

What did you do (The steps to reproduce)?
While sending, kill master
What is expected to see?
The client does not report an error
What did you see instead?
send error, timeout

When the broker master fails，Clients cannot choose to send to other broker masters。 When the broker master fails，return timeout，Client will not retry。 This is because the first occurrence of the client consumes all the timeout。when return timeout ，The client will consume all timeouts.No time left to retry

DefaultMQProducerImpl# 607 sendResult = this.sendKernelImpl(msg, mq, communicationMode, sendCallback, topicPublishInfo, timeout - costTime);

Jul 29 '22 02:07 Cczzzz

BUG REPORT

Please describe the issue you observed:

What did you do (The steps to reproduce)?

While sending, kill master

What is expected to see?

The client does not report an error

What did you see instead?

send error, timeout

When the broker master fails，Clients cannot choose to send to other broker masters。 When the broker master fails，return timeout，Client will not retry。 This is because the first occurrence of the client consumes all the timeout。when return timeout ，The client will consume all timeouts.No time left to retry

DefaultMQProducerImpl# 607 sendResult = this.sendKernelImpl(msg, mq, communicationMode, sendCallback, topicPublishInfo, timeout - costTime);

I dont think this is a bug, in sync mode, the argument of timout is provided by the application layer, total retry timeout consume the timeout together. In async mode, only retry once.

Jul 31 '22 02:07 SeaItFover

It will cause the client to fail when the master goes down.is this reasonable？I always thought the client was insensitive

Aug 02 '22 08:08 Cczzzz

When I set the timeout to 10s, it still cannot be retried correctly. 2022-08-02 16:22:05,917 WARN RocketmqClient - sendKernelImpl exception, resend at once, InvokeID: -2111425297163537192, RT: 10015ms, Broker: MessageQueue [topic=CONNECT, brokerName=broker-c, queueId=0]

Aug 02 '22 08:08 Cczzzz

https://github.com/apache/rocketmq/pull/3555 @Cczzzz Verify that this PR solves this issue

Aug 02 '22 09:08 duhenglucky

#3555 @Cczzzz Verify that this PR solves this issue

@duhenglucky no,is different

Aug 04 '22 06:08 Cczzzz

This is because the first send exceeds the set timeout, so it will not continue to retry. I think this restriction should be removed and the timeout should be recalculated when retrying

Aug 04 '22 07:08 panzhi33

Finally, this issue is raised again.

IMO, the retry strategy on the client side needs significant refinement. I suggest creating a RIP to improve this. We need something similar to what gRPC-client-retry strategy https://github.com/grpc/proposal/blob/master/A6-client-retries.md

Aug 11 '22 03:08 lizhanhui

This issue is stale because it has been open for 365 days with no activity. It will be closed in 3 days if no further activity occurs.

Aug 12 '23 00:08 github-actions[bot]

This issue was closed because it has been inactive for 3 days since being marked as stale.

Aug 15 '23 00:08 github-actions[bot]