iotdb icon indicating copy to clipboard operation
iotdb copied to clipboard

fix testAlignByDevice2Device3Region

Open kabo87777 opened this issue 3 months ago • 0 comments

Fix Non-Deterministic Behavior in AggregationDistributionTest.testAlignByDevice2Device3Region

Problem

The test testAlignByDevice2Device3Region was failing non-deterministically under NonDex with an 80% failure rate (4 out of 5 runs) due to order-dependent fragment access.

Way to Reproduce

cd iotdb-core/datanode
mvn edu.illinois:nondex-maven-plugin:2.1.1:nondex \
  -Dtest=AggregationDistributionTest#testAlignByDevice2Device3Region \
  -DnondexRuns=5

# Expected: Test fails with certain NonDex seeds (e.g., 974622, 1016066, 1057510, 1098954)
# Failure: AssertionError when assertions expect specific fragment order

Root Cause

The test accessed fragment instances and their children by fixed array indices (get(0), get(1)), assuming they would always appear in a specific order:

PlanNode f1Root = plan.getInstances().get(0).getFragment().getPlanNodeTree();
PlanNode f2Root = plan.getInstances().get(1).getFragment().getPlanNodeTree();
assertTrue(f1Root instanceof AggregationMergeSortNode);
assertTrue(f2Root instanceof DeviceViewNode);

When NonDex shuffled collection iteration order during distributed query planning, fragment instances and their plan tree children appeared in different orders, causing the test to fail even though the query plan was semantically correct.

Solution

Made the test order-independent by counting node types across the entire plan tree instead of checking specific positions:

1. Added Helper Method for Recursive Tree Traversal

// Helper method to count nodes of a specific type in a plan tree
private int countNodesOfType(PlanNode root, Class<?> nodeType) {
  if (root == null) {
    return 0;
  }
  int count = nodeType.isInstance(root) ? 1 : 0;
  for (PlanNode child : root.getChildren()) {
    count += countNodesOfType(child, nodeType);
  }
  return count;
}

2. Changed from Position-Based to Count-Based Verification

Before (Order-Dependent):

PlanNode f1Root = plan.getInstances().get(0)...;  // Assumes position 0
assertTrue(f1Root instanceof AggregationMergeSortNode);
assertTrue(f1Root.getChildren().get(0) instanceof DeviceViewNode);

After (Order-Independent):

// Count node types across all fragments
int totalAggregationMergeSortNodes = 0;
int totalDeviceViewNodes = 0;
int totalExchangeNodes = 0;
int totalFullOuterTimeJoinNodes = 0;

for (FragmentInstance instance : plan.getInstances()) {
  PlanNode root = instance.getFragment().getPlanNodeTree();
  totalAggregationMergeSortNodes += countNodesOfType(root, AggregationMergeSortNode.class);
  totalDeviceViewNodes += countNodesOfType(root, DeviceViewNode.class);
  totalExchangeNodes += countNodesOfType(root, ExchangeNode.class);
  totalFullOuterTimeJoinNodes += countNodesOfType(root, FullOuterTimeJoinNode.class);
}

// Verify the plan has the expected structure
assertEquals("Expected one AggregationMergeSortNode", 1, totalAggregationMergeSortNodes);
assertTrue("Expected at least two DeviceViewNodes", totalDeviceViewNodes >= 2);
assertTrue("Expected at least one ExchangeNode", totalExchangeNodes >= 1);
assertTrue("Expected at least one FullOuterTimeJoinNode", totalFullOuterTimeJoinNodes >= 1);

3. Made Assertions Flexible

Used >= instead of == for counts that can vary based on query optimizer decisions, acknowledging that the exact number of certain node types may differ depending on internal execution plan variations.

Verification

Tested with 40 NonDex runs - 0 failures (100% success rate):

mvn edu.illinois:nondex-maven-plugin:2.1.1:nondex \
  -Dtest=AggregationDistributionTest#testAlignByDevice2Device3Region \
  -DnondexRuns=40
# Result: 0 failures

Key Changed Classes

  • AggregationDistributionTest:
    • Modified testAlignByDevice2Device3Region test method to use order-independent verification
    • Added countNodesOfType() helper method for recursive plan tree traversal

kabo87777 avatar Oct 19 '25 14:10 kabo87777