Missing file data preprocess and Bug in training task MRC
Hi. I am trying to use the ViDeBERTa model to refine an MRC task on a ViQuAD dataset. However, according to the provided code, file Finetuning/QA/extractive-qa-mrc/utils/preprocess.py is missing.

Then, I used the load_dataset function of the datasets library instead, and got this error during model training.
model_checkpoint = "Fsoft-AIC/videberta-base"
model = RobertaForQuestionAnswering.from_pretrained(model_checkpoint)
model_name = model_checkpoint.split("/")[-1]
args = TrainingArguments(
f"{model_name}-finetuned-quad2.0",
num_train_epochs=2.0,
evaluation_strategy = "epoch",
learning_rate=2e-5,
warmup_ratio=0.05,
weight_decay=0.01,
per_device_train_batch_size=batch_size,
per_device_eval_batch_size=batch_size,
load_best_model_at_end=True,
save_strategy="epoch",
save_total_limit=5,
# do_train = True,
# do_eval = False,
#change the number of training epochs to get a better result
#push_to_hub=True,
)
from transformers import default_data_collator
data_collator = default_data_collator
trainer = Trainer(
model,
args,
train_dataset=tokenized_train,
eval_dataset=tokenized_valid,
data_collator=data_collator,
tokenizer=tokenizer,
)

Looking forward to getting an answer to solve this problem.
Hi @ThuanPhong1801,
Thanks for your interest in our repository.
We have uploaded the preprocess.py file for ViQuAD dataset processing. You can find it in the folder fine-tuning/QA/utils and then perform the fine-tuning process again following our implementation in the file Machine_reading_comprehension.ipynb.
Hope it can be helpful for you!
Cheers, Cong Dao