continue icon indicating copy to clipboard operation
continue copied to clipboard

Huggingface-TEI reranker not showing up as option

Open rafaol opened this issue 10 months ago • 4 comments

Before submitting your bug report

Relevant environment info

- OS: MacOS Sequoia 15.4.1
- Hardware: MacBook Pro with Apple M2
- Continue version: 1.0.6
- IDE version: VSCode 1.99.3
- Model:
- config:
  
name: Local Assistant
version: 1.0.0
schema: v1
models:
  - name: Autodetect
    provider: ollama
    model: AUTODETECT
  - name: Qwen2.5 1.5b Autocomplete
    provider: ollama
    model: qwen2.5-coder:1.5b
    roles:
      - autocomplete
  - name: Nomic Text Embed
    provider: ollama
    model: nomic-embed-text
    roles:
      - embed
  - name: MXBAI Embed
    provider: ollama
    model: mxbai-embed-large
    roles:
      - embed
  - name: TEI Reranker
    provider: huggingface-tei
    apiBase: http://localhost:8088
    model: BAAI/bge-reranker-v2-m3
    roles:
      - rerank
context:
  - provider: code
  - provider: docs
  - provider: diff
  - provider: terminal
  - provider: problems
  - provider: folder
  - provider: codebase
    params:
      nRetrieve: 32
      nFinal: 16
      useReranking: true
  - provider: web
  - provider: url
  - provider: repo-map
    params:
      includeSignatures: false

Description

I have been trying to use a local reranker via Huggingface's Text Embeddings Inference (TEI). TEI was locally installed via cargo with support for Apple's Metal. It is running OK, and I can check the output, as shown below.

% text-embeddings-router --model-id "BAAI/bge-reranker-v2-m3" --port 8088
2025-04-23T03:07:23.164769Z  INFO text_embeddings_router: router/src/main.rs:185: Args { model_id: "BAA*/***-********-*2-m3", revision: None, tokenization_workers: None, dtype: None, pooling: None, max_concurrent_requests: 512, max_batch_tokens: 16384, max_batch_requests: None, max_client_batch_size: 32, auto_truncate: false, default_prompt_name: None, default_prompt: None, hf_api_token: None, hf_token: None, hostname: "0.0.0.0", port: 8088, uds_path: "/tmp/text-embeddings-inference-server", huggingface_hub_cache: None, payload_limit: 2000000, api_key: None, json_output: false, disable_spans: false, otlp_endpoint: None, otlp_service_name: "text-embeddings-inference.server", cors_allow_origin: None }
2025-04-23T03:07:23.167464Z  INFO hf_hub: /Users/[omitted]/.cargo/registry/src/index.crates.io-1949cf8c6b5b557f/hf-hub-0.4.2/src/lib.rs:72: Using token file found "/Users/[omitted]/.cache/huggingface/token"    
2025-04-23T03:07:23.170008Z  INFO download_artifacts: text_embeddings_core::download: core/src/download.rs:20: Starting download
2025-04-23T03:07:23.170017Z  INFO download_artifacts:download_pool_config: text_embeddings_core::download: core/src/download.rs:53: Downloading `1_Pooling/config.json`
2025-04-23T03:07:23.490620Z  WARN download_artifacts: text_embeddings_core::download: core/src/download.rs:26: Download failed: request error: HTTP status client error (404 Not Found) for url (https://huggingface.co/BAAI/bge-reranker-v2-m3/resolve/main/1_Pooling/config.json)
2025-04-23T03:07:26.176065Z  INFO download_artifacts:download_new_st_config: text_embeddings_core::download: core/src/download.rs:77: Downloading `config_sentence_transformers.json`
2025-04-23T03:07:26.589347Z  WARN download_artifacts: text_embeddings_core::download: core/src/download.rs:36: Download failed: request error: HTTP status client error (404 Not Found) for url (https://huggingface.co/BAAI/bge-reranker-v2-m3/resolve/main/config_sentence_transformers.json)
2025-04-23T03:07:26.589382Z  INFO download_artifacts: text_embeddings_core::download: core/src/download.rs:40: Downloading `config.json`
2025-04-23T03:07:26.589636Z  INFO download_artifacts: text_embeddings_core::download: core/src/download.rs:43: Downloading `tokenizer.json`
2025-04-23T03:07:26.589728Z  INFO download_artifacts: text_embeddings_core::download: core/src/download.rs:47: Model artifacts downloaded in 3.4197675s
2025-04-23T03:07:26.815045Z  WARN text_embeddings_router: router/src/lib.rs:188: Could not find a Sentence Transformers config
2025-04-23T03:07:26.815063Z  INFO text_embeddings_router: router/src/lib.rs:192: Maximum number of tokens per request: 8192
2025-04-23T03:07:26.815078Z  INFO text_embeddings_core::tokenization: core/src/tokenization.rs:38: Starting 12 tokenization workers
2025-04-23T03:07:27.997825Z  INFO text_embeddings_router: router/src/lib.rs:234: Starting model backend
2025-04-23T03:07:27.997841Z  INFO text_embeddings_backend: backends/src/lib.rs:493: Downloading `model.safetensors`
2025-04-23T03:07:27.998005Z  INFO text_embeddings_backend: backends/src/lib.rs:377: Model weights downloaded in 163.542µs
2025-04-23T03:07:28.008655Z  INFO text_embeddings_backend_candle: backends/candle/src/lib.rs:249: Starting Bert model on Metal(MetalDevice(DeviceId(1)))
2025-04-23T03:07:31.483289Z  INFO text_embeddings_router::http::server: router/src/http/server.rs:1795: Starting HTTP server: 0.0.0.0:8088
2025-04-23T03:07:31.483304Z  INFO text_embeddings_router::http::server: router/src/http/server.rs:1796: Ready

After running:

curl -X POST http://localhost:8088/rerank  -H "Content-Type: application/json"  -d '{"query": "What is Python?", "texts": ["Python is a programming language.", "Java is a programming language."]}'

I get the expected response:

[{"index":0,"score":0.99958915},{"index":1,"score":0.002157342}]

The model is added with the rerank role in my config.yaml for Continue as shown above. No config errors are shown after saving the file. However, it still won't show as an available option for reranking in the models configuration tab.

Image

To reproduce

  1. Install Continue extension for VSCode on MacOS Sequoia
  2. Install TEI locally with Metal support
  3. Run text-embeddings-router --model-id "BAAI/bge-reranker-v2-m3" --port 8088 (or any other bge reranker)
  4. Add reranker to local assistant's config
  5. Try to select rerank model in Continue's models tab (over the chat box)

rafaol avatar Apr 23 '25 03:04 rafaol

The last version recognized it, now the latest version doesn't. And I found that what is written on the documentation here doesn't match, I don't know if the author forgot to update the documentation, and the last version couldn't follow the documentation, it had to be modeled after other models to get it right. https://docs.continue.dev/customize/model-roles/reranking

wdyichen avatar Apr 27 '25 10:04 wdyichen

@rafaol I know why, need to switch version to pre-release (e.g.1.1.26) and it will be recognized.

wdyichen avatar Apr 28 '25 05:04 wdyichen

@rafaol I know why, need to switch version to pre-release (e.g. 1.1.26) and it will be recognized.

Thanks @wdyichen ! That worked for now. Hopefully the next version will have this fix.

rafaol avatar Apr 28 '25 06:04 rafaol

HI guys, I've followed you guideline guys but so far I'm still having an issue. 'hugging-face' seems not to be recognized and throw an error once I pass into as a 'name' property. I'm using continue.dev extension v1.1.26

Image

jcqttran avatar Apr 28 '25 10:04 jcqttran

@jcqttran It is recommended to use yaml format configuration file, before the old version I also json is ok, but after the new version json is not correct.

wdyichen avatar Apr 28 '25 11:04 wdyichen

@jcqttran It is recommended to use yaml format configuration file, before the old version I also json is ok, but after the new version json is not correct.

Thanks @wdyichen I'll try in yaml

jcqttran avatar Apr 28 '25 12:04 jcqttran

This issue hasn't been updated in 90 days and will be closed after an additional 10 days without activity. If it's still important, please leave a comment and share any new information that would help us address the issue.

github-actions[bot] avatar Aug 06 '25 02:08 github-actions[bot]

This issue was closed because it wasn't updated for 10 days after being marked stale. If it's still important, please reopen + comment and we'll gladly take another look!

github-actions[bot] avatar Aug 17 '25 02:08 github-actions[bot]