fastembed icon indicating copy to clipboard operation
fastembed copied to clipboard

[Feature]: Add CLI to launch FastEmbed as an Embedding API service (OpenAI-compatible)

Open magicbrighter opened this issue 2 months ago • 2 comments

What feature would you like to request?

Feature Request: Add CLI to launch FastEmbed as an Embedding API service (OpenAI-compatible)

Title

CLI to start FastEmbed as a local HTTP service providing /v1/embeddings endpoint


Background / Motivation

Currently FastEmbed is a Python library — users need to write Python code to create embeddings.
In many production setups:

  • Teams want to run FastEmbed as a self-contained service (like OpenAI embeddings API), locally or on an internal server.
  • Many existing apps/tools (LangChain, LlamaIndex, etc.) expect an OpenAI-style /v1/embeddings REST API, so they can work without code changes.
  • This would also make FastEmbed easier to integrate into non-Python environments.

Proposed Solution

Add a CLI command (e.g. fastembed serve) to start a HTTP server.
Server will host a REST API compatible with OpenAI's embeddings format:

CLI Example

fastembed serve \
  --model-name BAAI/bge-small-zh-v1.5 \
  --device cuda \
  --model-path /path/to/local/model \
  --port 8080 \
  --host 0.0.0.0

Optional flags:

  • --model-path : Specify local ONNX/quantized model file path (avoid re-download)
  • --model-name : HuggingFace model name (fallback to remote download if path not given)
  • --device : cpu / cuda
  • --port : Listening port
  • --host : Bind address

API Specification (POST /v1/embeddings)

Request:

{
  "input": [
    "Artificial intelligence is the future",
    "FastEmbed makes embeddings blazing fast"
  ]
}

Response:

{
  "object": "list",
  "data": [
    {
      "object": "embedding",
      "index": 0,
      "embedding": [-0.0246, -0.0536, -0.0010, ...]
    },
    {
      "object": "embedding",
      "index": 1,
      "embedding": [0.0123, -0.0456, 0.0231, ...]
    }
  ],
  "model": "BAAI/bge-small-zh-v1.5",
  "usage": {
    "prompt_tokens": 0,
    "total_tokens": 0
  }
}

Benefits

  • Drop-in replacement for OpenAI embeddings — developers can point existing code to local FastEmbed service by only changing OPENAI_API_BASE.
  • Works in multi-language environments (Java, Go, JS, etc.) via HTTP.
  • No cloud dependency — model runs locally, respects data privacy.
  • Easy deployment:
    • Local dev via CLI
    • Docker container for production use (docker run ... fastembed serve)

Alternatives

  • Users manually wrap FastEmbed in Flask/FastAPI (but requires extra coding).
  • Run embedding in Python only — limits cross-language integration.

Additional Context

  • Similar approach in Ollama (ollama serve) and HuggingFace Inference Server.
  • Many users (including me) have internal projects using /v1/embeddings, switching from OpenAI to FastEmbed would be zero code change with this feature.
  • Could later extend to /v1/reRank for ColBERT / BGE reranking models.

Possible Implementation Use FastAPI / Uvicorn internally:

from fastapi import FastAPI
from fastembed import TextEmbedding

app = FastAPI()
embedder = TextEmbedding(model_name="BAAI/bge-small-zh-v1.5")

@app.post("/v1/embeddings")
async def create_embeddings(request: dict):
    inputs = request.get("input", [])
    vectors = list(embedder.embed(inputs))
    return {
        "object": "list",
        "data": [
            {"object": "embedding", "index": i, "embedding": vec.tolist()}
            for i, vec in enumerate(vectors)
        ],
        "model": "BAAI/bge-small-zh-v1.5",
        "usage": {"prompt_tokens": 0, "total_tokens": 0}
    }

Is there any additional information you would like to provide?

No response

magicbrighter avatar Nov 05 '25 03:11 magicbrighter

A summary of the changes CodeRabbit can apply:

  • Implement a new CLI command to run FastEmbed as an OpenAI-compatible embedding API server by adding files fastembed/main.py, fastembed/cli/init.py, fastembed/cli/serve.py, fastembed/cli/README.md, and docs/CLI_Server_Guide.md, and modifying pyproject.toml to register the fastembed console script and add a [tool.poetry.group.server] optional dependency group (FastAPI, Uvicorn, Pydantic).

  • Add a CLI entrypoint and full server mode to run FastEmbed as an OpenAI-compatible embeddings API by adding fastembed.main.py, a fastembed/cli package (serve.py implementing FastAPI endpoints /v1/embeddings, /v1/models, /health, a README, and a simple import test), new top-level docs (docs/CLI_Server_Guide.md), and update pyproject.toml to register the fastembed console script and an optional "server" dependency group (fastapi, uvicorn, pydantic).

  • [ ] ✅ Create PR with these edits
  • [ ] 📋 Get copyable edits

coderabbitai[bot] avatar Nov 05 '25 03:11 coderabbitai[bot]

Bump

recursingfeynman avatar Nov 12 '25 14:11 recursingfeynman