[Question]: How to get `meetingbank_test_3qa_pairs_summary_formated.json`?
Describe the issue
When I am trying to run the script experiments/llmlingua2/evaluation/scripts/compress.sh, it seems that the code for constructing ../../../results/meetingbank_short/origin/meetingbank_test_3qa_pairs_summary_formated.json is missed? Similarly, I can neither found the construction codes for ../../../results/longbench/origin/longbench_test_single_doc_qa_formated.json, ../../../results/zero_scrolls/origin/zero_scrolls_validation.json and ../../../results/gsm8k/origin/gsm8k_cot_example_all_in_one.json.
May I know how to construct these json formatted data files? Thanks for your consideration!
Hi, @mzf666, thank you for raising the question.
We have provided the meetingbank_test_3qa_pairs_summary_formated.json on huggingface. For Longbench, you can refer to the format_data scripts and the LongBench repo.
@mzf666 I figured out how to get the dataset into the appropriate format for compress.sh
from datasets import load_dataset
import json
import os
os.makedirs("results/meetingbank_short/origin", exist_ok=True)
if not os.path.exists("results/meetingbank_short/origin/meetingbank_test_3qa_pairs_summary_formated.json"):
meeting_bank_comp = load_dataset("microsoft/MeetingBank-QA-Summary", split="test")
json.dump(
meeting_bank_comp.to_list(),
open("results/meetingbank_short/origin/meetingbank_test_3qa_pairs_summary_formated.json", "w"),
)