Missing file data preprocess and Bug in training task MRC

Open PhongNTDo opened this issue 3 years ago • 1 comments

Hi. I am trying to use the ViDeBERTa model to refine an MRC task on a ViQuAD dataset. However, according to the provided code, file Finetuning/QA/extractive-qa-mrc/utils/preprocess.py is missing.

Screenshot from 2023-04-24 17-53-23

Then, I used the load_dataset function of the datasets library instead, and got this error during model training.

model_checkpoint = "Fsoft-AIC/videberta-base"
model = RobertaForQuestionAnswering.from_pretrained(model_checkpoint)

model_name = model_checkpoint.split("/")[-1]
args = TrainingArguments(
    f"{model_name}-finetuned-quad2.0",
    num_train_epochs=2.0,
    evaluation_strategy = "epoch",
    learning_rate=2e-5,
    warmup_ratio=0.05,
    weight_decay=0.01,
    per_device_train_batch_size=batch_size,
    per_device_eval_batch_size=batch_size,
    load_best_model_at_end=True,
    save_strategy="epoch",
    save_total_limit=5,
    # do_train = True,
    # do_eval = False,
    #change the number of training epochs to get a better result
    #push_to_hub=True,
)

from transformers import default_data_collator
data_collator = default_data_collator

trainer = Trainer(
    model,
    args,
    train_dataset=tokenized_train,
    eval_dataset=tokenized_valid,
    data_collator=data_collator,
    tokenizer=tokenizer,
)

Screenshot from 2023-04-24 17-57-53

Looking forward to getting an answer to solve this problem.

Apr 24 '23 10:04 PhongNTDo

Hi @ThuanPhong1801,

Thanks for your interest in our repository.

We have uploaded the preprocess.py file for ViQuAD dataset processing. You can find it in the folder fine-tuning/QA/utils and then perform the fine-tuning process again following our implementation in the file Machine_reading_comprehension.ipynb.

Hope it can be helpful for you!

Cheers, Cong Dao

Apr 24 '23 17:04 DaoTranbk