Qingquan Song comments

Results 37 comments of


                                            Qingquan Song

relative errors of SiLRTC and HaLRTC are consistently 1

> When I run the demo you present, the relative errors of SiLRTC and HaLRTC are consistently 1. > > I am not seeing any errors in the code or...

Overly High Win Rate for Alpaca v2 on mistral 7b orpo

Hey @YannDubs thank you so much for the prompt response! We're using some in-house supported openAI API call to call GPT-4 as judgement (which is roughly the same as the...

Overly High Win Rate for Alpaca v2 on mistral 7b orpo

> This is very surprising indeed. Just to understand, why are you not using the default alpaca_eval 2? > > > > i.e. `alpaca_eval evaluate_from_model --model_configs 'mistral-7b-orpo'` > > >...

Overly High Win Rate for Alpaca v2 on mistral 7b orpo

Is there also a way to not using the logprob api with `alpaca_eval_clf_gpt4_turbo` if we cannot access the logprob such as still using the `alpaca_eval_gpt4` annotator? Thank you! Best regards,...

Overly High Win Rate for Alpaca v2 on mistral 7b orpo

Hey @YannDubs , it's unfortunately that I cannot adopt the API to access the logprob in our case, and seems like `alpaca_eval evaluate_from_model --model_configs 'mistral-7b-orpo' --annotators_config 'alpaca_eval_clf_gpt4_turbo'` also need to...

Overly High Win Rate for Alpaca v2 on mistral 7b orpo

Hey @YannDubs wanna reopen this issue. When I'm using `alpaca_eval evaluate_from_model --model_configs 'mistral-7b-orpo' --annotators_config 'alpaca_eval_gpt4_turbo_fn' as suggested. The results is still high ~31-36 length_controlled win rate and ~50 win rate...

Qingquan Song

relative errors of SiLRTC and HaLRTC are consistently 1

Overly High Win Rate for Alpaca v2 on mistral 7b orpo

Overly High Win Rate for Alpaca v2 on mistral 7b orpo

Overly High Win Rate for Alpaca v2 on mistral 7b orpo

Overly High Win Rate for Alpaca v2 on mistral 7b orpo

Overly High Win Rate for Alpaca v2 on mistral 7b orpo

Overly High Win Rate for Alpaca v2 on mistral 7b orpo

failed to load whisper decoder engine with paged kv cache

failed to load whisper decoder engine with paged kv cache

[feat] FusedLinearCrossEntropy support for Gemma2