Bad performance for running run_retrieve_tevatron.sh

Open acphile opened this issue 1 year ago • 0 comments

Hi, I try to build the index of the wiki corpus using the script you provide in scripts/run_retrieve_tevatron.sh. However, I find the performance of retrieval evaluation is very bad. The command I run is

for s in $(seq -f "%02g" 0 4)
do
CUDA_VISIBLE_DEVICES=${s} python -m tevatron.retriever.driver.encode \
  --output_dir=temp \
  --model_name_or_path BAAI/bge-large-en-v1.5 \
  --normalize True \
  --fp16 \
  --per_device_eval_batch_size 128 \
  --passage_max_len 512 \
  --dataset_name "TIGER-Lab/LongRAG" \
  --dataset_config "hotpot_qa_corpus" \
  --dataset_split "train" \
  --dataset_number_of_shards 4 \
  --encode_output_path emb_bge_official/corpus_emb_${s}.pkl \
  --dataset_shard_index ${s} >${s}.log 2>&1 &
done

CUDA_VISIBLE_DEVICES=0 python -m tevatron.retriever.driver.encode \
  --output_dir=temp \
  --model_name_or_path BAAI/bge-large-en-v1.5  \
  --normalize True \
  --query_prefix "Represent this sentence for searching relevant passages: " \
  --fp16 \
  --per_device_eval_batch_size 256 \
  --dataset_name "TIGER-Lab/LongRAG" \
  --dataset_config "hotpot_qa" \
  --dataset_split "subset_1000" \
  --encode_output_path query_hotpot_1000.pkl \
  --query_max_len 32 \
  --encode_is_query

CUDA_VISIBLE_DEVICES=0 python -m tevatron.retriever.driver.search \
  --query_reps query_hotpot_1000.pkl \
  --passage_reps "emb_bge_official/corpus_emb*.pkl" \
  --depth 200 \
  --batch_size -1 \
  --save_text \
  --save_ranking_to hqa_official_rank_200_new.txt

After checking the implementation of tevatron, I think it does not have the implementation of the max_p design as described in 'Similarity search' Section 2.1 of your paper. Do you mind providing your implementation and commands for similarity search which can be used to reproduce the BGE-large row in your Table 4? Thank you very much!

Nov 20 '24 01:11 acphile