LongRAG
LongRAG copied to clipboard
Bad performance for running run_retrieve_tevatron.sh
Hi, I try to build the index of the wiki corpus using the script you provide in scripts/run_retrieve_tevatron.sh. However, I find the performance of retrieval evaluation is very bad.
The command I run is
for s in $(seq -f "%02g" 0 4)
do
CUDA_VISIBLE_DEVICES=${s} python -m tevatron.retriever.driver.encode \
--output_dir=temp \
--model_name_or_path BAAI/bge-large-en-v1.5 \
--normalize True \
--fp16 \
--per_device_eval_batch_size 128 \
--passage_max_len 512 \
--dataset_name "TIGER-Lab/LongRAG" \
--dataset_config "hotpot_qa_corpus" \
--dataset_split "train" \
--dataset_number_of_shards 4 \
--encode_output_path emb_bge_official/corpus_emb_${s}.pkl \
--dataset_shard_index ${s} >${s}.log 2>&1 &
done
CUDA_VISIBLE_DEVICES=0 python -m tevatron.retriever.driver.encode \
--output_dir=temp \
--model_name_or_path BAAI/bge-large-en-v1.5 \
--normalize True \
--query_prefix "Represent this sentence for searching relevant passages: " \
--fp16 \
--per_device_eval_batch_size 256 \
--dataset_name "TIGER-Lab/LongRAG" \
--dataset_config "hotpot_qa" \
--dataset_split "subset_1000" \
--encode_output_path query_hotpot_1000.pkl \
--query_max_len 32 \
--encode_is_query
CUDA_VISIBLE_DEVICES=0 python -m tevatron.retriever.driver.search \
--query_reps query_hotpot_1000.pkl \
--passage_reps "emb_bge_official/corpus_emb*.pkl" \
--depth 200 \
--batch_size -1 \
--save_text \
--save_ranking_to hqa_official_rank_200_new.txt
After checking the implementation of tevatron, I think it does not have the implementation of the max_p design as described in 'Similarity search' Section 2.1 of your paper. Do you mind providing your implementation and commands for similarity search which can be used to reproduce the BGE-large row in your Table 4? Thank you very much!