Yuhang Lai
Yuhang Lai
+1 encounter the same issue here
huggingface_hub 0.22.2 是最新版的
However, it would produce more stable (actually the same) generations using hf transformers when setting temperature = 0
Hi, this is the [leaderboard](https://ds1000-code-gen.github.io/model_DS1000.html) for DS-1000, which is on the project page. We will get some regular updates on it.
How to replicate this error? Did you just run `test_ds1000.py` to test `data/codex002-answers.jsonl`?
@saraLiii hi, > taking the unsafe_execute out of the check_correctness as global function fix the issue Does this work for you? Which version of python do you use?
I also observed this performance drop in 4o. I think the given solution is indeed wrong. It doesn't follow the code context and focuses too much on the example value...
Cool, I've tried some prompting on web ChatGPT. It seems that GPT-4o couldn't understand my instructions to complete the code and not repeat the example values, while GPT-4 or even...
@bienehito Glad to find a good prompt that makes 4o series work. Simply use "Only provide the code completion needed. Don't repeat the context code." as the system prompt can...
Yes, I believe this is the right procedure. I just uploaded the [cached answers of gpt-4o-2024-08-06](https://github.com/xlang-ai/DS-1000/blob/main/data/gpt-4o-2024-08-06-answers.jsonl), and I ran again the accuracy remains 0.599. Could you please check and see...