geohashTools icon indicating copy to clipboard operation
geohashTools copied to clipboard

Phase 2: Optimize gh_covering Performance (2-3× Speedup)

Open dshkol opened this issue 5 months ago • 1 comments

Summary

This PR optimizes gh_covering by eliminating a redundant encode-decode cycle, achieving 2-3× speedup across typical use cases.

Problem

The previous implementation had an inefficient data flow:

Grid → gh_encode → gh_to_sf → gh_to_spdf → gh_to_sp → gh_decode → Build polygons

The encode-decode round-trip was wasteful since we only needed the decoded coordinates to build polygons.

Solution

Streamlined the flow to:

Grid → gh_encode → gh_decode → Build polygons (direct)

Now gh_covering decodes geohashes once and builds SpatialPolygons directly, skipping the intermediate conversions through gh_to_sfgh_to_spdfgh_to_sp.

Performance Results

Benchmark: 100 random points, 0.5° spread

Precision Old (ms) New (ms) Speedup
4 10.9 5.0 2.17×
5 15.9 6.6 2.41×
6 180.5 57.8 3.13×
7 5528.6 1802.5 3.07×

Median speedup: ~2.7×

Higher precisions show larger absolute time savings (e.g., precision 7 saves ~3.7 seconds).

Testing

✅ All 48 existing tests pass
✅ No breaking changes to API or behavior
✅ Same output as before, just faster
✅ Benchmark scripts included in benchmarks/

Changes

  • R/gis_tools.R: Refactored gh_covering to build polygons directly from decoded coordinates
  • NEWS.md: Documented performance improvement
  • benchmarks/: Added microbenchmark scripts for reproducibility

Verification

# Run benchmarks yourself
source("benchmarks/gh_covering_bench_simple.R")

# Run tests
devtools::test()

Next Steps

This is part of a series of optimization PRs:

  • ✅ Phase 1: Quick fixes and code quality (#43)
  • ✅ Phase 2: gh_covering optimization (this PR)
  • 🔜 Phase 3: Duplicate detection optimization
  • 🔜 Phase 4: CI/CD modernization (GitHub Actions)

🤖 Generated with Claude Code

dshkol avatar Nov 11 '25 05:11 dshkol

A jj chain should be the easiest way to prevent headache from merge conflicts among this sequence of PRs as review suggestions are incorporated / each branch drifts from the state as filed:

https://github.com/jj-vcs/jj

It seems Claude can basically get the gist of how to use it too even though it's relatively new:

https://claude.ai/share/e53b881a-f456-44dc-ae0c-2d96936cf366

MichaelChirico avatar Nov 11 '25 07:11 MichaelChirico