parseable icon indicating copy to clipboard operation
parseable copied to clipboard

chore: release memory to the OS every hour

Open nikhilsinhaparseable opened this issue 3 months ago • 1 comments

Summary by CodeRabbit

  • Performance & Optimization

    • Improved memory management across query paths with batched response processing and explicit memory-release steps for more stable large-response handling
    • Added a periodic memory-release scheduler to reduce memory retention under load
  • New Features

    • Exposed a public memory management module with functions to trigger and initialize periodic memory release
  • Chores

    • Added allocator-related dependencies to support the new memory management features

nikhilsinhaparseable avatar Oct 25 '25 03:10 nikhilsinhaparseable

Walkthrough

Adds jemalloc as the global allocator, introduces a memory release module and periodic scheduler, and applies memory-conscious refactors across HTTP query response generation, Arrow utilities, and startup/init flows to reduce retained memory during processing and streaming.

Changes

Cohort / File(s) Change Summary
Allocator & deps
Cargo.toml, src/main.rs
Added tikv-jemalloc-ctl, tikv-jemallocator, tikv-jemalloc-sys and configured tikv_jemallocator::Jemalloc as the global allocator.
Memory module & scheduler
src/memory.rs, src/lib.rs
New public memory module with force_memory_release() and init_memory_release_scheduler(); exposes pub mod memory from lib.
Server init updates
src/handlers/http/modal/*.rs, src/handlers/http/modal/server.rs
Init flows now call init_memory_release_scheduler() during startup (ingest, query, parseable servers), propagating errors from scheduler init.
Query handling memory changes
src/handlers/http/query.rs
Reworked non-streaming and NDJSON streaming paths to create intermediate response objects, convert JSON in chunks, explicitly drop intermediates, and call memory-release hooks to minimize retained memory.
Response batching
src/response.rs
to_json refactored to process record batches in fixed-size chunks (100), accumulate results per-batch, and remove reliance on itertools; overall batched JSON conversion implemented.
Arrow utils
src/utils/arrow/mod.rs
Short-circuit on empty input, pre-allocate buffer capacity, use Cursor + serde_json::from_reader with proper error propagation instead of prior unwrap behavior.
Metastore minor refactor
src/metastore/metastores/object_store_metastore.rs
Simplified delete_overview() to a single-line storage.delete_object(&path).await? call; no behavioral change.

Sequence Diagram(s)

sequenceDiagram
    participant Client as HTTP Client
    participant Handler as Query Handler
    participant Response as QueryResponse
    participant Memory as Memory Module
    participant Jemalloc as Jemalloc

    Client->>Handler: HTTP query request
    Handler->>Handler: Execute query, build QueryResponse
    Handler->>Response: to_json() (batched, 100-record chunks)
    loop per batch
        Response->>Response: Convert batch -> JSON values
        Response->>Response: Apply fill_null
        Response->>Response: Drop intermediate JSON
    end
    Response-->>Handler: Return JSON/stream frames
    Handler->>Memory: optional force_memory_release() (rate-limited)
    Memory->>Jemalloc: advance epoch + purge arenas
    Jemalloc-->>Memory: freed unused memory
    Handler->>Client: send HTTP response

Estimated code review effort

🎯 4 (Complex) | ⏱️ ~60 minutes

  • Review memory module correctness (jemalloc epoch/arena calls) and error handling in src/memory.rs.
  • Verify global allocator integration in src/main.rs and Cargo.toml versions/compatibility.
  • Validate batched JSON logic in src/response.rs for correctness and performance.
  • Confirm no missed code paths for memory-release calls in src/handlers/http/query.rs.
  • Check Arrow parsing changes for correct error propagation in src/utils/arrow/mod.rs.

Suggested labels

for next release

Suggested reviewers

  • parmesant

Poem

🐰 I nibble bytes and tidy heaps,

I shuffle chunks while everyone sleeps,
Jemalloc hums, I call release,
Batches shrink and memory's at peace,
Hooray — light-footed server leaps! ✨

Pre-merge checks and finishing touches

❌ Failed checks (1 warning)
Check name Status Explanation Resolution
Description Check ⚠️ Warning The pull request description is entirely absent; no description content was provided by the author. The repository template requires a description section that outlines the goal of the PR, discusses possible solutions and the rationale for the chosen approach, and details key changes made in the patch. Additionally, the template includes a checklist requiring confirmation of testing and documentation. With zero description content provided, the PR fails to meet the repository's documentation requirements for pull requests. The author should add a comprehensive pull request description following the repository template. This should include the goal of implementing periodic memory release to the OS, an explanation of the chosen solution (jemalloc with AsyncScheduler), a summary of key changes (new memory module, dependency additions, server initialization updates), and confirmation that the changes have been tested and that documentation and explanatory comments have been added where appropriate.
✅ Passed checks (2 passed)
Check name Status Explanation
Title Check ✅ Passed The pull request title "chore: release memory to the OS every hour" directly and accurately describes the primary objective of the changeset. The title is concise, specific, and clearly communicates the main feature being added—periodic memory release scheduling via jemalloc—which aligns perfectly with the substantial changes across multiple files including the new memory module, dependency additions, and initialization updates in various server components. The title uses no noise or vague language and provides enough context for a developer scanning history to understand the core change.
Docstring Coverage ✅ Passed Docstring coverage is 100.00% which is sufficient. The required threshold is 80.00%.
✨ Finishing touches
  • [ ] 📝 Generate docstrings
🧪 Generate unit tests (beta)
  • [ ] Create PR with unit tests
  • [ ] Post copyable unit tests in a comment

Comment @coderabbitai help to get the list of available commands and usage tips.

coderabbitai[bot] avatar Oct 25 '25 03:10 coderabbitai[bot]