When the broker master fails,return timeout,Client will not retry.
BUG REPORT
- Please describe the issue you observed:
-
What did you do (The steps to reproduce)?
-
While sending, kill master
-
What is expected to see?
-
The client does not report an error
-
What did you see instead?
-
send error, timeout
When the broker master fails,Clients cannot choose to send to other broker masters。 When the broker master fails,return timeout,Client will not retry。 This is because the first occurrence of the client consumes all the timeout。when return timeout ,The client will consume all timeouts.No time left to retry
DefaultMQProducerImpl# 607 sendResult = this.sendKernelImpl(msg, mq, communicationMode, sendCallback, topicPublishInfo, timeout - costTime);
BUG REPORT
- Please describe the issue you observed:
- What did you do (The steps to reproduce)?
- While sending, kill master
- What is expected to see?
- The client does not report an error
- What did you see instead?
- send error, timeout
When the broker master fails,Clients cannot choose to send to other broker masters。 When the broker master fails,return timeout,Client will not retry。 This is because the first occurrence of the client consumes all the timeout。when return timeout ,The client will consume all timeouts.No time left to retry
DefaultMQProducerImpl# 607 sendResult = this.sendKernelImpl(msg, mq, communicationMode, sendCallback, topicPublishInfo, timeout - costTime);
I dont think this is a bug, in sync mode, the argument of timout is provided by the application layer, total retry timeout consume the timeout together. In async mode, only retry once.
It will cause the client to fail when the master goes down.is this reasonable?I always thought the client was insensitive
When I set the timeout to 10s, it still cannot be retried correctly. 2022-08-02 16:22:05,917 WARN RocketmqClient - sendKernelImpl exception, resend at once, InvokeID: -2111425297163537192, RT: 10015ms, Broker: MessageQueue [topic=CONNECT, brokerName=broker-c, queueId=0]
https://github.com/apache/rocketmq/pull/3555 @Cczzzz Verify that this PR solves this issue
#3555 @Cczzzz Verify that this PR solves this issue
@duhenglucky no,is different
This is because the first send exceeds the set timeout, so it will not continue to retry. I think this restriction should be removed and the timeout should be recalculated when retrying
Finally, this issue is raised again.
IMO, the retry strategy on the client side needs significant refinement. I suggest creating a RIP to improve this. We need something similar to what gRPC-client-retry strategy https://github.com/grpc/proposal/blob/master/A6-client-retries.md
This issue is stale because it has been open for 365 days with no activity. It will be closed in 3 days if no further activity occurs.
This issue was closed because it has been inactive for 3 days since being marked as stale.