beir icon indicating copy to clipboard operation
beir copied to clipboard

ValueError when running `evaluate_bm25.py`

Open jordane95 opened this issue 4 years ago • 5 comments

Hi, I was trying to run your evaluate_bm25.py baseline, but I got the following error. There may be some problem with elasticsearch. Could you please help me fix it?

2022-02-17 02:38:34 - Loading Queries...
2022-02-17 02:38:34 - Loaded 300 TEST Queries.
2022-02-17 02:38:34 - Query Example: 0-dimensional biomaterials show inductive properties.
2022-02-17 02:38:34 - Activating Elasticsearch....
2022-02-17 02:38:34 - Elastic Search Credentials: {'hostname': 'localhost', 'index_name': 'scifact', 'keys': {'title': 'title', 'body': 'txt'}, 'timeout': 100, 'retry_on_timeout': True, 'maxsize': 24, 'number_of_shards': 1, 'language': 'english'}
Traceback (most recent call last):
  File "evaluate_bm25.py", line 64, in <module>
    model = BM25(index_name=index_name, hostname=hostname, initialize=initialize, number_of_shards=number_of_shards)
  File "/anaconda/envs/beir/lib/python3.8/site-packages/beir/retrieval/search/lexical/bm25_search.py", line 22, in __init__
    self.es = ElasticSearch(self.config)
  File "/anaconda/envs/beir/lib/python3.8/site-packages/beir/retrieval/search/lexical/elastic_search.py", line 34, in __init__
    self.es = Elasticsearch(
  File "/anaconda/envs/beir/lib/python3.8/site-packages/elasticsearch/_sync/client/__init__.py", line 312, in __init__
    node_configs = client_node_configs(
  File "/anaconda/envs/beir/lib/python3.8/site-packages/elasticsearch/_sync/client/utils.py", line 101, in client_node_configs
    node_configs = hosts_to_node_configs(hosts)
  File "/anaconda/envs/beir/lib/python3.8/site-packages/elasticsearch/_sync/client/utils.py", line 141, in hosts_to_node_configs
    node_configs.append(url_to_node_config(host))
  File "/anaconda/envs/beir/lib/python3.8/site-packages/elastic_transport/client_utils.py", line 198, in url_to_node_config
    raise ValueError(
ValueError: URL must include a 'scheme', 'host', and 'port' component (ie 'https://localhost:9200')

jordane95 avatar Feb 17 '22 02:02 jordane95

As hostname I think you must use http://localhost (or http://localhost:9200), not just localhost

nreimers avatar Feb 17 '22 08:02 nreimers

Thank you so much! I change the hostname to http://localhost:9200 and it works. But when I run it to evaluate BM25, I get different scores at different runs. For example, the NDCG@10 score ranges from 0.64~0.67 on scifact dataset. Do you know why? Is there any randomness in the BM25 algorithm?

jordane95 avatar Feb 17 '22 08:02 jordane95

This was addressed in https://github.com/UKPLab/beir/issues/58

Not sure if the latest release already includes this. You can either update BEIR to use the latest version from the GIT. Or you add a sleep after you index the documents in your code.

nreimers avatar Feb 17 '22 08:02 nreimers

I see. It's fixed in the beir code but not yet included in the examples. I add a sleep time and eventually get a consistent score.

jordane95 avatar Feb 17 '22 10:02 jordane95

Hi @jordane95,

Yes soon with our next pip update, hopefully, this should not be an issue anymore and consistent scores should be visible with Elasticsearch BM25. Thanks for notifying me!

Kind regards, Nandan Thakur

thakur-nandan avatar Feb 18 '22 21:02 thakur-nandan