rocketmq icon indicating copy to clipboard operation
rocketmq copied to clipboard

UpdateTopicRouteInfoFromNameServer method in the concurrent lack of detection

Open zhaowenshuai011 opened this issue 3 years ago • 8 comments

#4561 问题: updateTopicRouteInfoFromNameServer在某些场景下lockNamesrv代码块的内部缺少校验,可能会导致RemotingTooMuchRequestException("sendDefaultImpl call timeout")异常。

描述: 生产者发送消息时,会通过tryToFindTopicPublishInfo获取路由信息,如果没有缓存或没有messageQueue时会调用updateTopicRouteInfoFromNameServer向nameServer查询该topic的路由信息,在此方法内部有一个nameServer锁,这里极端情况下会出现一些问题,如果在同一时刻有很多个线程排队取lockNamesrv锁,假如前面的线程已通过nameServer拉取到路由信息了,在释放锁之后,后续的线程是没有必要重复与nameServer数据交互的,在网络不好等情况下很可能造成后面排队的线程超时(默认值3s),然后抛throw new RemotingTooMuchRequestException("sendDefaultImpl call timeout")异常。举个较为极端的例子:100个线程使用同一个producer,大家都第一次发送某个topic的消息。

zhaowenshuai011 avatar Jul 05 '22 10:07 zhaowenshuai011

Today did encounter this problem, the network is bad and many times to send the same topic messages and then throw the RemotingTooMuchRequestException

Oliverwqcwrw avatar Jul 06 '22 07:07 Oliverwqcwrw

Today did encounter this problem, the network is bad and many times to send the same topic messages and then throw the RemotingTooMuchRequestException

Yes, this is also a problem I stumbled across recently, hopefully it can be fixed

zhaowenshuai011 avatar Jul 07 '22 01:07 zhaowenshuai011

According to the job log, it is successful. Why is build Failed displayed?

zhaowenshuai011 avatar Jul 08 '22 08:07 zhaowenshuai011

@15712852007 Travis CI failed, pls check it.

lizhiboo avatar Jul 11 '22 01:07 lizhiboo

Codecov Report

Merging #4562 (25ed8b9) into develop (6f5cd4a) will decrease coverage by 0.20%. The diff coverage is 100.00%.

@@              Coverage Diff              @@
##             develop    #4562      +/-   ##
=============================================
- Coverage      48.19%   47.98%   -0.21%     
+ Complexity      5132     5106      -26     
=============================================
  Files            649      649              
  Lines          43037    43040       +3     
  Branches        5629     5630       +1     
=============================================
- Hits           20740    20653      -87     
- Misses         19785    19871      +86     
- Partials        2512     2516       +4     
Impacted Files Coverage Δ
...rocketmq/client/impl/factory/MQClientInstance.java 50.75% <100.00%> (-0.54%) :arrow_down:
...mq/client/impl/producer/DefaultMQProducerImpl.java 44.43% <100.00%> (-0.38%) :arrow_down:
...rg/apache/rocketmq/common/stats/StatsSnapshot.java 84.61% <0.00%> (-15.39%) :arrow_down:
...rocketmq/broker/filtersrv/FilterServerManager.java 20.00% <0.00%> (-14.29%) :arrow_down:
...ache/rocketmq/common/stats/MomentStatsItemSet.java 39.13% <0.00%> (-13.05%) :arrow_down:
.../apache/rocketmq/common/stats/MomentStatsItem.java 38.09% <0.00%> (-9.53%) :arrow_down:
...a/org/apache/rocketmq/logging/inner/SysLogger.java 28.57% <0.00%> (-5.72%) :arrow_down:
...in/java/org/apache/rocketmq/test/util/MQAdmin.java 38.88% <0.00%> (-5.56%) :arrow_down:
...va/org/apache/rocketmq/common/stats/StatsItem.java 50.83% <0.00%> (-5.00%) :arrow_down:
...java/org/apache/rocketmq/logging/inner/Logger.java 51.67% <0.00%> (-4.31%) :arrow_down:
... and 19 more

Continue to review full report at Codecov.

Legend - Click here to learn more Δ = absolute <relative> (impact), ø = not affected, ? = missing data Powered by Codecov. Last update 6f5cd4a...25ed8b9. Read the comment docs.

codecov-commenter avatar Jul 13 '22 05:07 codecov-commenter

Coverage Status

Coverage increased (+0.08%) to 52.084% when pulling 25ed8b9a83efdf0c91e7e6086920af91264f99b7 on 15712852007:hotfix/updateTopicRouteInfo_doubleCheckLock into 6f5cd4aa40de1a121d2d47f8b9175a86c0e7bcc2 on apache:develop.

coveralls avatar Jul 13 '22 05:07 coveralls

Coverage Status

Coverage decreased (-4.4%) to 47.571% when pulling 25ed8b9a83efdf0c91e7e6086920af91264f99b7 on 15712852007:hotfix/updateTopicRouteInfo_doubleCheckLock into 6f5cd4aa40de1a121d2d47f8b9175a86c0e7bcc2 on apache:develop.

coveralls avatar Jul 13 '22 05:07 coveralls

@duhenglucky @RongtongJin @lizhanhui All checks have passed, please help review and merge, thank you

zhaowenshuai011 avatar Sep 01 '22 14:09 zhaowenshuai011

This PR is stale because it has been open for 365 days with no activity. It will be closed in 3 days if no further activity occurs. If you wish not to mark it as stale, please leave a comment in this PR.

github-actions[bot] avatar Sep 07 '23 00:09 github-actions[bot]

This PR was closed because it has been inactive for 3 days since being marked as stale.

github-actions[bot] avatar Sep 10 '23 00:09 github-actions[bot]

Encountering same issue here. Looks like this PR didn't get merged in the end. Any plan to fix this in the future?

Hyperkopite avatar Feb 02 '24 10:02 Hyperkopite