recce icon indicating copy to clipboard operation
recce copied to clipboard

⚡️ Speed up method `DbtAdapter.build_parent_map` by 3,421%

Open misrasaurabh1 opened this issue 5 months ago • 3 comments

📄 3,421% (34.21x) speedup for DbtAdapter.build_parent_map in recce/adapter/dbt_adapter/__init__.py

⏱️ Runtime : 41.5 milliseconds 1.18 milliseconds (best of 21 runs)

📝 Explanation and details

The optimized code achieves a dramatic 3420% speedup by addressing two critical performance bottlenecks:

Primary Optimization: Avoiding expensive to_dict() conversion

  • The original code calls manifest.to_dict() which consumes 98.4% of the total runtime (499ms out of 507ms)
  • The optimized version attempts to access manifest.parent_map directly, falling back to to_dict() only if needed
  • This eliminates the massive serialization overhead when the parent_map is already available as an attribute

Secondary Optimization: Converting to set for O(1) lookups

  • Changes node_ids = nodes.keys() to node_ids = set(nodes)
  • Set membership tests (if k not in node_ids) are O(1) instead of O(n) for dict_keys views
  • While this optimization shows smaller gains in the profiler, it provides consistent performance improvements for larger node sets

Performance Impact:

  • Total runtime drops from 41.5ms to 1.18ms
  • The manifest.to_dict() bottleneck is completely eliminated in the common case
  • Set-based membership checks are more efficient, especially as the number of nodes scales

Test Case Benefits: The optimizations are particularly effective for:

  • Large dbt projects with many nodes (where set lookups show greater advantage)
  • Repeated calls to build_parent_map (avoiding repeated expensive dict conversions)
  • Any scenario where the manifest object already has a parent_map attribute available

The try/except ensures backward compatibility while maximizing performance gains in the common case.

Correctness verification report:

Test Status
⚙️ Existing Unit Tests 🔘 None Found
🌀 Generated Regression Tests 🔘 None Found
⏪ Replay Tests 34 Passed
🔎 Concolic Coverage Tests 🔘 None Found
📊 Tests Coverage 100.0%
⏪ Replay Tests and Runtime
Test File::Test Function Original ⏱️ Optimized ⏱️ Speedup
test_pytest_tests__replay_test_0.py::test_recce_adapter_dbt_adapter___init___DbtAdapter_build_parent_map 40.7ms 233μs 17297%✅

To edit these changes git checkout codeflash/optimize-DbtAdapter.build_parent_map-med9gwbs and push.

Codeflash

misrasaurabh1 avatar Aug 19 '25 07:08 misrasaurabh1

Codecov Report

:x: Patch coverage is 66.66667% with 2 lines in your changes missing coverage. Please review.

Files with missing lines Patch % Lines
recce/adapter/dbt_adapter/__init__.py 66.66% 2 Missing :warning:
Files with missing lines Coverage Δ
recce/adapter/dbt_adapter/__init__.py 74.09% <66.66%> (-0.12%) :arrow_down:

... and 9 files with indirect coverage changes

:rocket: New features to boost your workflow:
  • :snowflake: Test Analytics: Detect flaky tests, report on failures, and find test suite problems.
  • :package: JS Bundle Analysis: Save yourself from yourself by tracking and limiting bundle sizes in JS merges.

codecov[bot] avatar Aug 19 '25 07:08 codecov[bot]

should be ready to merge

misrasaurabh1 avatar Aug 21 '25 22:08 misrasaurabh1

Hi misrasaurabh1, I would prefer to keep the node_ids = nodes.keys() as is for readability. Could you please help edit it and fix the failed CI jobs? thanks!

And we need a sign-off in the commit messages to pass the DCO check. You could provide your commit with git commit --signoff or git commit -s

Thanks!

wcchang1115 avatar Aug 22 '25 02:08 wcchang1115