ozone icon indicating copy to clipboard operation
ozone copied to clipboard

HDDS-10483. Container Balancer should only move containers with size greater than 0 bytes

Open sarvekshayr opened this issue 1 year ago • 1 comments

What changes were proposed in this pull request?

Introduced a check on the size containers allowed to move during the balancing process.

What is the link to the Apache JIRA

https://issues.apache.org/jira/browse/HDDS-10483

How was this patch tested?

The patch was tested using the existing tests containerBalancerShouldObeyMaxSizeToMoveLimit() and balancerShouldObeyMaxSizeEnteringTargetLimit() in TestContainerBalancerTask.java. It passed successfully, ensuring that the balancer selected only containers with a size greater than 0 bytes, even when zero-byte containers were present.

sarvekshayr avatar Mar 27 '24 07:03 sarvekshayr

Why is this change needed? If there is a skew in zero size objects it will show in the heartbeat load form a Datanode.

kerneltime avatar Mar 27 '24 17:03 kerneltime

Why is this change needed? If there is a skew in zero size objects it will show in the heartbeat load form a Datanode.

@kerneltime Its observed containerInfo is going -ve or "0" even there are blocks in it. This is observed and still root cause is not known. A workaround is handled to avoid delete container by using a flag earlier. ContainerBalancer also have impact because of this and this PR will try to solve (till the actual problem is not found/resolved).

sumitagrawl avatar Mar 28 '24 05:03 sumitagrawl

@sarvekshayr Thanks for working on this. The code change looks good, but we should have an explicit unit test that asserts balancer only selects positive size containers. This ensures we don't have to depend on another unit test that's meant to test a different behaviour.

Added a unit test to assert the expected behaviour of the balancer.

sarvekshayr avatar Apr 03 '24 05:04 sarvekshayr

@sarvekshayr can you look into the CI failures?

kerneltime avatar Apr 08 '24 16:04 kerneltime

The balancer test had probably failed because of https://github.com/apache/ozone/pull/6481. Recon failure looks unrelated.

siddhantsangwan avatar Apr 10 '24 05:04 siddhantsangwan

Merged to master with green CI. Thanks @sarvekshayr and all the reviewers.

siddhantsangwan avatar Apr 10 '24 08:04 siddhantsangwan