DriveLM baseline score

Having a reference point for the baseline model's scores would be incredibly beneficial for my team and me as we develop our approach and compare its performance. If it's possible, could you please provide us with the baseline model's scores on the validation dataset, or direct us to where we could find this information?

Apr 23 '24 06:04 Lorraine-Kwok

Please see here

Apr 23 '24 07:04 ChonghaoSima

Thank you

Apr 23 '24 08:04 Lorraine-Kwok

Hi,

I've been looking at the language scores reported in your results and noticed that the baseline's language score is much lower compared to the GPT's high score. Additionally, the GPT scores for both sampled data and test data are almost identical, leading to very similar final scores.

Could you shed some light on the following?

Why is there such a big difference in language scores between the baseline and GPT? How can the GPT scores be so close for sampled and test data? It seems a bit odd, and I'm trying to make sense of it. Any clarification would be appreciated.

Thanks!

Apr 24 '24 07:04 Lorraine-Kwok