qrdlgit comments

Results 33 comments of


                                            qrdlgit

Idea for Evals: Sorting numbers with repeats and negatives

@voynow Why did you close this? It looks good to me, but you should get Andrew's opinion.

Idea for Evals: Sorting numbers with repeats and negatives

Yeah, I saw that as well. TBH though, this seems like a great eval to me, but I'm just a user. Sorting things like this is a very common use...

feature: read_only Memory option

this seems like an extremely common use case, I feel like we (OP and I) are missing something obvious. If I create a massive (GBs) read only cache used for...

Are not merged PRs the result of irrelevancy to the model?

I downloaded all the merged PRs and asked GPT4 to summarize the common characteristics: The merged evals cover a wide range of topics and skills, including: - Language understanding: Japanese,...

Are not merged PRs the result of irrelevancy to the model?

@SkyaTura Yes, absolutely. For those serious about creating an eval here, there is definitely value in going back through all the PRs and reading them closely. That said, it's possible...

Are not merged PRs the result of irrelevancy to the model?

Not so much expensive, though perhaps a bit technically challenging. However, we can always ask GPT4, right? Try this prompt: _I'd like to better understand why PRs are being merged...

Are not merged PRs the result of irrelevancy to the model?

@SkyaTura I think your deviation was important and there needs to be more discussion around this topic - but you're right. I'll take the blame for the hijack here and...

Are not merged PRs the result of irrelevancy to the model?

One suggestion for folks at open ai, you might want to add an attribute to the checkbox: [] I understand that opening a PR, even if it meets the requirements...

Is there a list of tasks that chatgpt fails?

try the logic evals https://github.com/openai/evals/tree/main/evals/registry/data/logic They fail even with cot of reasoning

Taxonomy to use for model evaluation?

Anyone can do this! Works better with GPT4. ![image](https://user-images.githubusercontent.com/129564070/232673638-1ca2f0b2-b6b7-4587-bdf7-18b75b95bf90.png)