LLMLingua icon indicating copy to clipboard operation
LLMLingua copied to clipboard

[Question]: How to get `meetingbank_test_3qa_pairs_summary_formated.json`?

Open mzf666 opened this issue 1 year ago • 2 comments

Describe the issue

When I am trying to run the script experiments/llmlingua2/evaluation/scripts/compress.sh, it seems that the code for constructing ../../../results/meetingbank_short/origin/meetingbank_test_3qa_pairs_summary_formated.json is missed? Similarly, I can neither found the construction codes for ../../../results/longbench/origin/longbench_test_single_doc_qa_formated.json, ../../../results/zero_scrolls/origin/zero_scrolls_validation.json and ../../../results/gsm8k/origin/gsm8k_cot_example_all_in_one.json.

May I know how to construct these json formatted data files? Thanks for your consideration!

mzf666 avatar Jul 24 '24 01:07 mzf666

Hi, @mzf666, thank you for raising the question.

We have provided the meetingbank_test_3qa_pairs_summary_formated.json on huggingface. For Longbench, you can refer to the format_data scripts and the LongBench repo.

pzs19 avatar Jul 30 '24 08:07 pzs19

@mzf666 I figured out how to get the dataset into the appropriate format for compress.sh

from datasets import load_dataset
import json
import os


os.makedirs("results/meetingbank_short/origin", exist_ok=True)
if not os.path.exists("results/meetingbank_short/origin/meetingbank_test_3qa_pairs_summary_formated.json"):
    meeting_bank_comp = load_dataset("microsoft/MeetingBank-QA-Summary", split="test")
    json.dump(
        meeting_bank_comp.to_list(),
        open("results/meetingbank_short/origin/meetingbank_test_3qa_pairs_summary_formated.json", "w"),
    )

cornzz avatar Aug 22 '24 23:08 cornzz