Xueguang Ma 马雪光

Results 13 issues of Xueguang Ma 马雪光

e.g. tevatron uses pyserini for evaluation, EntityQuestion uses pyserini for bm25 search. in those case, its not make sense to let user install faiss/torch/onnxruntime etc. we can utilize transformers utils...

It's good to add ivf feature here https://github.com/castorini/pyserini/blob/master/pyserini/index/faiss.py ref comments from @t-k- : https://github.com/castorini/pyserini/issues/798#issuecomment-944807307 we also need to think about the way to split the index during index time.

its good to support dense retrieval from multiple faiss index shard. this would be more friendly for machine with limited RAM. (and further support GPU retrieval)

We have our first example of demo https://github.com/castorini/pyserini/blob/master/pyserini/demo/msmarco.py as described in https://github.com/castorini/pyserini/pull/546. Currently, it is a demo with Sparse search only, we want it to support Dense search and Hybrid...

Now hybrid search will search `k` from dense, `k` from sparse and return `k` as hybrid. We should make it more configurable, i.e. different k for dense and sparse, and...

https://github.com/castorini/pyserini/blob/master/pyserini/encode/__init__.py https://github.com/castorini/pyserini/blob/master/pyserini/search/faiss/_searcher.py we want to use the encoder defined in pyserini/encode need a clean up.

add contriever experiments for 2CR. 1. contriever on BEIR 2. mcontriever on MIRACL 3. contriever-ft on BEIR 4. mcontriever-ft on MIRACL indexes for 1) and 2) are ready on orca...