Zehan Li
Zehan Li
Oh! I update the version to 0.1.96 and everything works well. Thank you.
Hello again, I test the bug and it appears again after a month... I'm using the newest version of both package ``` pytorch-lightning==1.5.10 sentencepiece==0.1.96 ``` The bug is yielded by...
Hi @yangky11 , could you try to switch the import order to see if that works? ```Python import sentencepiece import pytorch_lightning ```
Thank you so much! I change the hostname to `http://localhost:9200` and it works. But when I run it to evaluate BM25, I get different scores at different runs. For example,...
I see. It's fixed in the `beir` code but not yet included in the `examples`. I add a sleep time and eventually get a consistent score.
Hi @txsun1997 Thanks for the update! With the new code and hyperparameters, I can successfully replicate the results. ``` ********* Evaluated on dev set ********* Dev loss: 1.0993. Dev perf:...
Hi, have you updated the pip package since I still have this problem for now (using `pip install tevatron`)
What do you think? 
Hi @ChenghaoMou , I'm facing the same problem using another local minhash deduplication, which removes significantly less documents than spark implementation. See https://github.com/huggingface/datatrove/issues/107
I have tried another implementation of [starcoder](https://github.com/bigcode-project/bigcode-dataset/blob/main/near_deduplication/minhash_deduplication_spark.py), which produces nearly same deduplication rate as datatrove implementation. Is there any reason why bigcode didn't use the graphframe implementation in this code...