Error while creating predictions on heldout dataset
Steps to reproduce:
- Create new dataset using create_hf_dataset.py script
- In the config, point to your finetuned model and new dataset. We are using XLMR model.
Running torchrun --nproc_per_node=1 scripts/predict.py -c examples/xlmr_base_test_20220411.yml
throws the below error.
Traceback (most recent call last):
File "/local/home/desktop/Experiments/massive/scripts/predict.py", line 112, in
~~Hi @iamsimha , greetings. To resolve this error, you must point to the numerical mapping for your slots. EX: https://github.com/alexa/massive/blob/0d474f326086d01fa320e081e12a7cea5950cfe3/examples/mt5_base_t2t_mmnlu_20220720.yml#L34~~
~~Please let us know if that works. Thanks.~~
Ah, wait, maybe I read your traceback too quickly. Let me check into this a little further.
So in my local version of the huggingface-ified evaluation data, created using scripts/create_hf_dataset.py, for each record there is a slots_str key with an empty value. This must be absent in your version of the evaluation data, right? Options are to either (A) add it to yours or (B) do a code change to allow the collator, etc, to work without it. Option B is a better longterm solution, but I'm not sure if we'll have bandwidth on our side in the near term. Please let us know if Option A is workable. Thanks!