UltraRAG
UltraRAG copied to clipboard
CUDA out of memory when embedding and indexing corpus
Describe the bug
CUDA is out of memory while using the default Retriever Server with multimodal embedding model colpali-v1.3-merged to embed 6492 images. I suppose it's because the retriever attempts to read the entire corpus at once and embed it in a single batch. Would you please consider optimizing the tools (retriever.retriever_init, retriever.retriever_embed, retriever.retriever_index)? Thanks.
To Reproduce pipeline parameter files:
# courpus_index_parameter.yaml
retriever:
corpus_path: data/corpus.jsonl
cuda_devices: 4,5,6,7
embedding_path: embedding/embedding.npy
faiss_use_gpu: true
index_chunk_size: 50000
index_path: index/index.index
infinity_kwargs:
batch_size: 256
bettertransformer: false
device: cuda
model_warmup: false
pooling_method: auto
engine: torch
is_multimodal: true
overwrite: true
retriever_path: vidore/colpali-v1.3-merged
Thanks for reporting this! We’ll work on a fix and update as soon as possible.