OfirArviv
OfirArviv
I have the same issue, with flan-t5-xxl, ul2 an xglm. But I ran the same code without 4bit and just LORA, and the model converged normally. So it is the...
I have the same problem. I fine-tuned mt5-xxl, ul2 and xglm-7.5 model on 2 datasets, and the model manage to "learn" for a good amount of steps, but usually after...
@yoavkatz @elronbandel when this is read please go over thoroughly. There were a lot of changes, so please make sure I didn't forget anything. These are the original Mt-Bench prompts...
> Regarding naming, I think the right naming for the tasks are: `evaluation.response_rating` (single turn) `evaluation.response_rating.with_reference` `evaluation.response_selection` or `evaluation.response_preference` etc > > also in the task fields i would change...
> > Regarding naming, I think the right naming for the tasks are: `evaluation.response_rating` (single turn) `evaluation.response_rating.with_reference` `evaluation.response_selection` or `evaluation.response_preference` etc > > also in the task fields i would...
@yoavkatz I updated the catalog test pyhon to 3.9, as bam require this as a minimum. I kept the 3.8 for the core unitxt test so we ca still claim...
> @yoavkatz I updated the catalog test pyhon to 3.9, as bam require this as a minimum. I kept the 3.8 for the core unitxt test so we ca still...
It seems the data is moved here: https://huggingface.co/datasets/lmarena-ai/arena-hard-auto/tree/main/data/arena-hard-v0.1 Please see if the file structure is the same and if we can just point the card to the new directory
Thanks for the reply. I already have the predictions from your evaluation records data. I'm only looking to evaluate the response. Per your instruction. I"m running vlmutil eval "Claude3-5V_Sonnet_MMBench_V11.xlsx" for...