rocketmq icon indicating copy to clipboard operation
rocketmq copied to clipboard

[ISSUE #2476] Fix HaClient Infinite loop

Open crazywen opened this issue 5 years ago • 7 comments

try to fix epollwait no block, trash selector, regen it!

What is the purpose of the change

fix epollwait no block https://github.com/apache/rocketmq/issues/2487

Follow this checklist to help us incorporate your contribution quickly and easily. Notice, it would be helpful if you could finish the following 5 checklist(the last one is not necessary)before request the community to review your PR.

  • [x] Make sure there is a Github issue filed for the change (usually before you start working on it). Trivial changes like typos do not require a Github issue. Your pull request should address just this issue, without pulling in other changes - one PR resolves one issue.
  • [x] Format the pull request title like [ISSUE #123] Fix UnknownException when host config not exist. Each commit in the pull request should have a meaningful subject line and body.
  • [x] Write a pull request description that is detailed enough to understand what the pull request does, how, and why.
  • [x] Write necessary unit-test(over 80% coverage) to verify your logic correction, more mock a little better when cross module dependency exist. If the new feature or significant change is committed, please remember to add integration-test in test module.
  • [x] Run mvn -B clean apache-rat:check findbugs:findbugs checkstyle:checkstyle to make sure basic checks pass. Run mvn clean install -DskipITs to make sure unit-test pass. Run mvn clean test-compile failsafe:integration-test to make sure integration-test pass.
  • [ ] If this contribution is large, please file an Apache Individual Contributor License Agreement.

crazywen avatar Dec 04 '20 15:12 crazywen

Coverage Status

Coverage increased (+0.5%) to 51.832% when pulling cc68e7d62cc65bb2707c6064166141295382f144 on crazywen:master into 9f95a972e10e0681bc3f2d00e9957aa212e897b5 on apache:master.

coveralls avatar Dec 08 '20 11:12 coveralls

@crazywen The original reason is that the HAClient.READ_MAX_BUFFER_SIZE muse be equal or larger than the haTransferBatchSize.

Otherwise, the slave HAClient cannot read all the data to byteBufferRead as it has been full already, and the reallocation will take no effect, the new buffer will still be full.

image

dongeforever avatar Dec 16 '20 07:12 dongeforever

@crazywen It is not a good way to fix the BUFFER SIZE. You could introduce another config(for example haReceiveBatchSize) to control the HAClient read buffer size, which is now fixed to 4M by READ_MAX_BUFFER_SIZE. And then make sure the haTransferBatchSize is equal or smaller than haReceiveBatchSize.

dongeforever avatar Dec 16 '20 07:12 dongeforever

@crazywen It is not a good way to fix the BUFFER SIZE. You could introduce another config(for example haReceiveBatchSize) to control the HAClient read buffer size, which is now fixed to 4M by READ_MAX_BUFFER_SIZE. And then make sure the haTransferBatchSize is equal or smaller than haReceiveBatchSize.

crazywen avatar Dec 22 '20 01:12 crazywen

@crazywen It is not a good way to fix the BUFFER SIZE. You could introduce another config(for example haReceiveBatchSize) to control the HAClient read buffer size, which is now fixed to 4M by READ_MAX_BUFFER_SIZE. And then make sure the haTransferBatchSize is equal or smaller than haReceiveBatchSize.

haReceiveBatchSize also should have the max size, and the property is valid in slave, and haTransferBatchSize is valid in master, if like what you suggest, the master property should depend on the salve ?

crazywen avatar Dec 22 '20 01:12 crazywen

Codecov Report

Merging #2487 (cc68e7d) into master (9f95a97) will increase coverage by 0.76%. The diff coverage is 25.00%.

Impacted file tree graph

@@             Coverage Diff              @@
##             master    #2487      +/-   ##
============================================
+ Coverage     45.69%   46.46%   +0.76%     
- Complexity     4271     4551     +280     
============================================
  Files           546      547       +1     
  Lines         35860    38475    +2615     
  Branches       4763     5551     +788     
============================================
+ Hits          16388    17876    +1488     
- Misses        17429    18432    +1003     
- Partials       2043     2167     +124     
Impacted Files Coverage Δ Complexity Δ
...ache/rocketmq/store/config/MessageStoreConfig.java 63.68% <25.00%> (+3.68%) 117.00 <1.00> (+40.00)
...e/rocketmq/client/exception/MQBrokerException.java 56.25% <0.00%> (-10.42%) 3.00% <0.00%> (+2.00%) :arrow_down:
...ketmq/client/impl/consumer/PullMessageService.java 71.11% <0.00%> (-8.89%) 8.00% <0.00%> (-2.00%)
...in/java/org/apache/rocketmq/test/util/MQAdmin.java 38.88% <0.00%> (-5.56%) 7.00% <0.00%> (-1.00%)
...ache/rocketmq/common/stats/MomentStatsItemSet.java 39.13% <0.00%> (-4.35%) 5.00% <0.00%> (-1.00%)
...ketmq/common/protocol/body/ConsumerConnection.java 95.83% <0.00%> (-4.17%) 13.00% <0.00%> (-1.00%)
...mq/client/trace/hook/SendMessageTraceHookImpl.java 67.34% <0.00%> (-3.39%) 4.00% <0.00%> (ø%)
...ava/org/apache/rocketmq/test/util/VerifyUtils.java 46.26% <0.00%> (-2.99%) 14.00% <0.00%> (-1.00%)
...nt/impl/consumer/ConsumeMessageOrderlyService.java 42.59% <0.00%> (-2.89%) 16.00% <0.00%> (-3.00%)
...ava/org/apache/rocketmq/filter/util/BitsArray.java 59.82% <0.00%> (-2.57%) 30.00% <0.00%> (-1.00%)
... and 39 more

Continue to review full report at Codecov.

Legend - Click here to learn more Δ = absolute <relative> (impact), ø = not affected, ? = missing data Powered by Codecov. Last update 9f95a97...cc68e7d. Read the comment docs.

codecov-io avatar Dec 22 '20 01:12 codecov-io

@crazywen Usually, the config of each node should keep the same in the same cluster(or at least for pair of master-slave which has the same broker name). If so, just make sure haTransferBatchSize is equal or smaller than haReceiveBatchSize in the config file.

dongeforever avatar Dec 25 '20 06:12 dongeforever

This PR is stale because it has been open for 365 days with no activity. It will be closed in 3 days if no further activity occurs. If you wish not to mark it as stale, please leave a comment in this PR.

github-actions[bot] avatar Feb 22 '24 00:02 github-actions[bot]

This PR was closed because it has been inactive for 3 days since being marked as stale.

github-actions[bot] avatar Feb 25 '24 00:02 github-actions[bot]