Ogundepo Odunayo
Ogundepo Odunayo
I tried using the tokenizer visualizer but it doesn't seem to work when I load the tokenizer using `AutoTokenizer.from_pretrained()`. Here's the error I'm getting below: ``` --------------------------------------------------------------------------- AttributeError Traceback (most...
Added support for Yoruba Language Language Code = 'yo   '
Hi @luyug, any idea on how to fix this? 04/14/2022 15:48:04 - INFO - tevatron.trainer - Initializing Gradient Cache Trainer Traceback (most recent call last): File "/home/odunayo/anaconda3/envs/tevatron_env/lib/python3.9/runpy.py", line 197, in...
- Updated requirements to use a more recent version of pygaggle and pyserini. - The existing version of pyserini in the code cannot load Lucene indexes from the current Anserini...
The file [convert_trec_run_to_dpr_retrieval_run.py](https://github.com/castorini/pyserini/blob/master/pyserini/eval/convert_trec_run_to_dpr_retrieval_run.py) only allows the conversion of topics currently checked into anserini. I guess we can open this up to use custom query files also? https://github.com/castorini/pyserini/blob/2673031f6b202941fe0f9953c9b876e6d4f1e653/pyserini/eval/convert_trec_run_to_dpr_retrieval_run.py#L26-L37 I can see...
Initial PR Based on https://github.com/castorini/pyserini/issues/1375 Modularize imports so that LuceneSearcher does not rely on Faiss, torch, and transformers
Could probably do with some redesign but here's a first pass at integrating MLX into Pyserini with ColBERT. The tests and test outputs are similar to the same tests for...
Possible preprocessing feature: Preprocess unstructured text into passages possibly using Pygaggle segmentation https://github.com/castorini/pygaggle/blob/master/pygaggle/data/segmentation.py
Related issue relevant to Spacerini: https://github.com/castorini/pyserini/issues/1449