CUDA out of memory when embedding and indexing corpus

Open dsyislearning opened this issue 5 months ago • 1 comments

Describe the bug CUDA is out of memory while using the default Retriever Server with multimodal embedding model colpali-v1.3-merged to embed 6492 images. I suppose it's because the retriever attempts to read the entire corpus at once and embed it in a single batch. Would you please consider optimizing the tools (retriever.retriever_init, retriever.retriever_embed, retriever.retriever_index)? Thanks.

To Reproduce pipeline parameter files:

# courpus_index_parameter.yaml
retriever:
  corpus_path: data/corpus.jsonl
  cuda_devices: 4,5,6,7
  embedding_path: embedding/embedding.npy
  faiss_use_gpu: true
  index_chunk_size: 50000
  index_path: index/index.index
  infinity_kwargs:
    batch_size: 256
    bettertransformer: false
    device: cuda
    model_warmup: false
    pooling_method: auto
    engine: torch
  is_multimodal: true
  overwrite: true
  retriever_path: vidore/colpali-v1.3-merged

Sep 10 '25 13:09 dsyislearning

Thanks for reporting this! We’ll work on a fix and update as soon as possible.

Sep 11 '25 01:09 mssssss123