ValueError when running `evaluate_bm25.py`
Hi, I was trying to run your evaluate_bm25.py baseline, but I got the following error. There may be some problem with elasticsearch. Could you please help me fix it?
2022-02-17 02:38:34 - Loading Queries...
2022-02-17 02:38:34 - Loaded 300 TEST Queries.
2022-02-17 02:38:34 - Query Example: 0-dimensional biomaterials show inductive properties.
2022-02-17 02:38:34 - Activating Elasticsearch....
2022-02-17 02:38:34 - Elastic Search Credentials: {'hostname': 'localhost', 'index_name': 'scifact', 'keys': {'title': 'title', 'body': 'txt'}, 'timeout': 100, 'retry_on_timeout': True, 'maxsize': 24, 'number_of_shards': 1, 'language': 'english'}
Traceback (most recent call last):
File "evaluate_bm25.py", line 64, in <module>
model = BM25(index_name=index_name, hostname=hostname, initialize=initialize, number_of_shards=number_of_shards)
File "/anaconda/envs/beir/lib/python3.8/site-packages/beir/retrieval/search/lexical/bm25_search.py", line 22, in __init__
self.es = ElasticSearch(self.config)
File "/anaconda/envs/beir/lib/python3.8/site-packages/beir/retrieval/search/lexical/elastic_search.py", line 34, in __init__
self.es = Elasticsearch(
File "/anaconda/envs/beir/lib/python3.8/site-packages/elasticsearch/_sync/client/__init__.py", line 312, in __init__
node_configs = client_node_configs(
File "/anaconda/envs/beir/lib/python3.8/site-packages/elasticsearch/_sync/client/utils.py", line 101, in client_node_configs
node_configs = hosts_to_node_configs(hosts)
File "/anaconda/envs/beir/lib/python3.8/site-packages/elasticsearch/_sync/client/utils.py", line 141, in hosts_to_node_configs
node_configs.append(url_to_node_config(host))
File "/anaconda/envs/beir/lib/python3.8/site-packages/elastic_transport/client_utils.py", line 198, in url_to_node_config
raise ValueError(
ValueError: URL must include a 'scheme', 'host', and 'port' component (ie 'https://localhost:9200')
As hostname I think you must use http://localhost (or http://localhost:9200), not just localhost
Thank you so much! I change the hostname to http://localhost:9200 and it works. But when I run it to evaluate BM25, I get different scores at different runs. For example, the NDCG@10 score ranges from 0.64~0.67 on scifact dataset. Do you know why? Is there any randomness in the BM25 algorithm?
This was addressed in https://github.com/UKPLab/beir/issues/58
Not sure if the latest release already includes this. You can either update BEIR to use the latest version from the GIT. Or you add a sleep after you index the documents in your code.
I see. It's fixed in the beir code but not yet included in the examples. I add a sleep time and eventually get a consistent score.
Hi @jordane95,
Yes soon with our next pip update, hopefully, this should not be an issue anymore and consistent scores should be visible with Elasticsearch BM25. Thanks for notifying me!
Kind regards, Nandan Thakur