jeff3071
jeff3071
We evaluate llama using 100 examples of the [`SQuAD`](https://huggingface.co/datasets/squad) dataset with the [Open-evals](https://github.com/open-evals/evals) framework, which extends OpenAI's Evals for different language models. We consider the sentence immediately following the prompt...
We evaluate llama using 100 examples of the [`SQuAD`](https://huggingface.co/datasets/squad) dataset with the [Open-evals](https://github.com/open-evals/evals) framework, which extends OpenAI's Evals for different language models. We consider the sentence immediately following the prompt...