Results 2 issues of jeff3071

We evaluate llama using 100 examples of the [`SQuAD`](https://huggingface.co/datasets/squad) dataset with the [Open-evals](https://github.com/open-evals/evals) framework, which extends OpenAI's Evals for different language models. We consider the sentence immediately following the prompt...

We evaluate llama using 100 examples of the [`SQuAD`](https://huggingface.co/datasets/squad) dataset with the [Open-evals](https://github.com/open-evals/evals) framework, which extends OpenAI's Evals for different language models. We consider the sentence immediately following the prompt...