fastembed icon indicating copy to clipboard operation
fastembed copied to clipboard

feat: Free gpu space after each inference run

Open hh-space-invader opened this issue 10 months ago • 2 comments

This PR introduces two improvements to memory management in ONNX Runtime:

  • Mitigating Memory Fragmentation:

    • When running a high volume of inferences, the ONNX Runtime memory arena can become fragmented, leading to inefficient memory usage.
    • Enabling memory.enable_memory_arena_shrinkage forces cleanup of the memory arena after each run, reducing fragmentation at the cost of some performance.
  • Optimizing Memory Allocation Strategy:

    • By default, ONNX Runtime’s arena allocator expands memory aggressively using kNextPowerOfTwo, which can lead to excessive memory consumption (e.g., 1GB → 2GB → 4GB, etc.).
    • Switching to kSameAsRequested ensures that only the necessary memory is allocated, preventing unnecessary over-allocation.

All Submissions:

  • [ ] Have you followed the guidelines in our Contributing document?
  • [ ] Have you checked to ensure there aren't other open Pull Requests for the same update/change?

New Feature Submissions:

  • [ ] Does your submission pass the existing tests?
  • [ ] Have you added tests for your feature?
  • [ ] Have you installed pre-commit with pip3 install pre-commit and set up hooks with pre-commit install?

New models submission:

  • [ ] Have you added an explanation of why it's important to include this model?
  • [ ] Have you added tests for the new model? Were canonical values for tests computed via the original model?
  • [ ] Have you added the code snippet for how canonical values were computed?
  • [ ] Have you successfully ran tests with your changes locally?

hh-space-invader avatar Mar 04 '25 10:03 hh-space-invader

[!IMPORTANT]

Review skipped

Auto reviews are disabled on base/target branches other than the default branch.

🗂️ Base branches to auto review (3)
  • dev
  • master
  • main

Please check the settings in the CodeRabbit UI or the .coderabbit.yaml file in this repository. To trigger a single review, invoke the @coderabbitai review command.

You can disable this status message by setting the reviews.review_status to false in the CodeRabbit configuration file.


🪧 Tips

Chat

There are 3 ways to chat with CodeRabbit:

  • Review comments: Directly reply to a review comment made by CodeRabbit. Example:
    • I pushed a fix in commit <commit_id>, please review it.
    • Generate unit testing code for this file.
    • Open a follow-up GitHub issue for this discussion.
  • Files and specific lines of code (under the "Files changed" tab): Tag @coderabbitai in a new review comment at the desired location with your query. Examples:
    • @coderabbitai generate unit testing code for this file.
    • @coderabbitai modularize this function.
  • PR comments: Tag @coderabbitai in a new PR comment to ask questions about the PR branch. For the best results, please provide a very specific query, as very limited context is provided in this mode. Examples:
    • @coderabbitai gather interesting stats about this repository and render them as a table. Additionally, render a pie chart showing the language distribution in the codebase.
    • @coderabbitai read src/utils.ts and generate unit testing code.
    • @coderabbitai read the files in the src/scheduler package and generate a class diagram using mermaid and a README in the markdown format.
    • @coderabbitai help me debug CodeRabbit configuration file.

Note: Be mindful of the bot's finite context window. It's strongly recommended to break down tasks such as reading entire modules into smaller chunks. For a focused discussion, use review comments to chat about specific files and their changes, instead of using the PR comments.

CodeRabbit Commands (Invoked using PR comments)

  • @coderabbitai pause to pause the reviews on a PR.
  • @coderabbitai resume to resume the paused reviews.
  • @coderabbitai review to trigger an incremental review. This is useful when automatic reviews are disabled for the repository.
  • @coderabbitai full review to do a full review from scratch and review all the files again.
  • @coderabbitai summary to regenerate the summary of the PR.
  • @coderabbitai generate docstrings to generate docstrings for this PR.
  • @coderabbitai resolve resolve all the CodeRabbit review comments.
  • @coderabbitai configuration to show the current CodeRabbit configuration for the repository.
  • @coderabbitai help to get help.

Other keywords and placeholders

  • Add @coderabbitai ignore anywhere in the PR description to prevent this PR from being reviewed.
  • Add @coderabbitai summary to generate the high-level summary at a specific location in the PR description.
  • Add @coderabbitai anywhere in the PR title to generate the title automatically.

CodeRabbit Configuration File (.coderabbit.yaml)

  • You can programmatically configure CodeRabbit by adding a .coderabbit.yaml file to the root of your repository.
  • Please see the configuration documentation for more information.
  • If your editor has YAML language server enabled, you can add the path at the top of this file to enable auto-completion and validation: # yaml-language-server: $schema=https://coderabbit.ai/integrations/schema.v2.json

Documentation and Community

  • Visit our Documentation for detailed information on how to use CodeRabbit.
  • Join our Discord Community to get help, request features, and share feedback.
  • Follow us on X/Twitter for updates and announcements.

coderabbitai[bot] avatar Mar 04 '25 10:03 coderabbitai[bot]

Check out this pull request on  ReviewNB

See visual diffs & provide feedback on Jupyter Notebooks.


Powered by ReviewNB