Benjamin Clavié issues

Results 16 issues of


                                            Benjamin Clavié

README Indexing fails on two GPUs

I'm not sure if the problem is related to Colab, I also have an error using Jupyter locally on my Ubuntu server. The basic `readme.md` example doesn't work and the...

bug

feat: export hf hub and vespa

Basic support for utilities to export a model from a path on disk to Huggingface Hub, as well as convert the safe tensors to VespaColBERT onnx.

enhancement

Integrate DSPy as a third main API Class

Ongoing project. The goal is for RAGatouille to support more than just ColBERT, and build our way to UDAPDR support. Integrating DSPy is the next big milestone. No current definite...

enhancement

Support exporting index to HuggingFace Hub

Indexing is time consuming, and oftentimes people would like to be able to easily share pre-built index for various common datasets, for general domain application (wikipedia, code documentation...) and evaluation...

enhancement

help wanted

good first issue

Improve Testing

Testing is currently very sparse. It's essentially just ensuring model loading works properly (not tested in all cases yet) and reproducing the notebooks as end2end tests to make sure a...

enhancement

help wanted

good first issue

ongoing

More examples & documentation

Self-explanatory, currently very barebones. Any contribution, be it documentation, more examples, or deeper tutorials, is very welcome.

documentation

help wanted

good first issue

ongoing

Rework Dependencies: ship with barebones dependencies & bundle different features as extras

Putting this out there as a way to alleviate the _many_ dependencies issues. I'll soon be shipping a PLAID (&compression, that will come later)-free indexing method, which will alleviate the...

enhancement

help wanted

Support two indexing styles: ColBERT/PLAID style optimisation and HNSW-style uncompressed indexes

Currently, we only use the ColBERT optimised indexes, or index-free in-memory encodings. For low-to-medium volumes of documents, not using the ColBERT optimisation can have advantages: easier CRUD, potentially better performance,...

enhancement

help wanted

Indexing expansion

Decouple Indexing and Encoding to support other late-interaction models

Currently, Indexing is handled by upstream ColBERT completely, and only accepts ColBERT outputs. We want to decouple the indexing stage and the document encoding stage, so that we can take...

enhancement

help wanted

Indexing expansion