UpdateTopicRouteInfoFromNameServer method in the concurrent lack of detection
#4561 问题: updateTopicRouteInfoFromNameServer在某些场景下lockNamesrv代码块的内部缺少校验,可能会导致RemotingTooMuchRequestException("sendDefaultImpl call timeout")异常。
描述: 生产者发送消息时,会通过tryToFindTopicPublishInfo获取路由信息,如果没有缓存或没有messageQueue时会调用updateTopicRouteInfoFromNameServer向nameServer查询该topic的路由信息,在此方法内部有一个nameServer锁,这里极端情况下会出现一些问题,如果在同一时刻有很多个线程排队取lockNamesrv锁,假如前面的线程已通过nameServer拉取到路由信息了,在释放锁之后,后续的线程是没有必要重复与nameServer数据交互的,在网络不好等情况下很可能造成后面排队的线程超时(默认值3s),然后抛throw new RemotingTooMuchRequestException("sendDefaultImpl call timeout")异常。举个较为极端的例子:100个线程使用同一个producer,大家都第一次发送某个topic的消息。
Today did encounter this problem, the network is bad and many times to send the same topic messages and then throw the RemotingTooMuchRequestException
Today did encounter this problem, the network is bad and many times to send the same topic messages and then throw the RemotingTooMuchRequestException
Yes, this is also a problem I stumbled across recently, hopefully it can be fixed
According to the job log, it is successful. Why is build Failed displayed?
@15712852007 Travis CI failed, pls check it.
Codecov Report
Merging #4562 (25ed8b9) into develop (6f5cd4a) will decrease coverage by
0.20%. The diff coverage is100.00%.
@@ Coverage Diff @@
## develop #4562 +/- ##
=============================================
- Coverage 48.19% 47.98% -0.21%
+ Complexity 5132 5106 -26
=============================================
Files 649 649
Lines 43037 43040 +3
Branches 5629 5630 +1
=============================================
- Hits 20740 20653 -87
- Misses 19785 19871 +86
- Partials 2512 2516 +4
| Impacted Files | Coverage Δ | |
|---|---|---|
| ...rocketmq/client/impl/factory/MQClientInstance.java | 50.75% <100.00%> (-0.54%) |
:arrow_down: |
| ...mq/client/impl/producer/DefaultMQProducerImpl.java | 44.43% <100.00%> (-0.38%) |
:arrow_down: |
| ...rg/apache/rocketmq/common/stats/StatsSnapshot.java | 84.61% <0.00%> (-15.39%) |
:arrow_down: |
| ...rocketmq/broker/filtersrv/FilterServerManager.java | 20.00% <0.00%> (-14.29%) |
:arrow_down: |
| ...ache/rocketmq/common/stats/MomentStatsItemSet.java | 39.13% <0.00%> (-13.05%) |
:arrow_down: |
| .../apache/rocketmq/common/stats/MomentStatsItem.java | 38.09% <0.00%> (-9.53%) |
:arrow_down: |
| ...a/org/apache/rocketmq/logging/inner/SysLogger.java | 28.57% <0.00%> (-5.72%) |
:arrow_down: |
| ...in/java/org/apache/rocketmq/test/util/MQAdmin.java | 38.88% <0.00%> (-5.56%) |
:arrow_down: |
| ...va/org/apache/rocketmq/common/stats/StatsItem.java | 50.83% <0.00%> (-5.00%) |
:arrow_down: |
| ...java/org/apache/rocketmq/logging/inner/Logger.java | 51.67% <0.00%> (-4.31%) |
:arrow_down: |
| ... and 19 more |
Continue to review full report at Codecov.
Legend - Click here to learn more
Δ = absolute <relative> (impact),ø = not affected,? = missing dataPowered by Codecov. Last update 6f5cd4a...25ed8b9. Read the comment docs.
Coverage increased (+0.08%) to 52.084% when pulling 25ed8b9a83efdf0c91e7e6086920af91264f99b7 on 15712852007:hotfix/updateTopicRouteInfo_doubleCheckLock into 6f5cd4aa40de1a121d2d47f8b9175a86c0e7bcc2 on apache:develop.
Coverage decreased (-4.4%) to 47.571% when pulling 25ed8b9a83efdf0c91e7e6086920af91264f99b7 on 15712852007:hotfix/updateTopicRouteInfo_doubleCheckLock into 6f5cd4aa40de1a121d2d47f8b9175a86c0e7bcc2 on apache:develop.
@duhenglucky @RongtongJin @lizhanhui All checks have passed, please help review and merge, thank you
This PR is stale because it has been open for 365 days with no activity. It will be closed in 3 days if no further activity occurs. If you wish not to mark it as stale, please leave a comment in this PR.
This PR was closed because it has been inactive for 3 days since being marked as stale.
Encountering same issue here. Looks like this PR didn't get merged in the end. Any plan to fix this in the future?