Yuhang Lai comments

Results 10 comments of


                                            Yuhang Lai

使用镜像作为环境变量后部分数据在最新版 datasets 上无法正常下载

+1 encounter the same issue here

使用镜像作为环境变量后部分数据在最新版 datasets 上无法正常下载

huggingface_hub 0.22.2 是最新版的

[Bug]: topk=1 and temperature=0 cause different output in vllm

However, it would produce more stable (actually the same) generations using hf transformers when setting temperature = 0

Will there be a leaderboard for DS-1000?

Hi, this is the [leaderboard](https://ds1000-code-gen.github.io/model_DS1000.html) for DS-1000, which is on the project page. We will get some regular updates on it.

AttributeError: Can't pickle local object 'check_correctness.<locals>.unsafe_execute'

How to replicate this error? Did you just run `test_ds1000.py` to test `data/codex002-answers.jsonl`?

AttributeError: Can't pickle local object 'check_correctness.<locals>.unsafe_execute'

@saraLiii hi, > taking the unsafe_execute out of the check_correctness as global function fix the issue Does this work for you? Which version of python do you use?

Questions about the reduced accuracy of GPT4o

I also observed this performance drop in 4o. I think the given solution is indeed wrong. It doesn't follow the code context and focuses too much on the example value...

Questions about the reduced accuracy of GPT4o

Cool, I've tried some prompting on web ChatGPT. It seems that GPT-4o couldn't understand my instructions to complete the code and not repeat the example values, while GPT-4 or even...

Questions about the reduced accuracy of GPT4o

@bienehito Glad to find a good prompt that makes 4o series work. Simply use "Only provide the code completion needed. Don't repeat the context code." as the system prompt can...

Questions about the reduced accuracy of GPT4o

Yes, I believe this is the right procedure. I just uploaded the [cached answers of gpt-4o-2024-08-06](https://github.com/xlang-ai/DS-1000/blob/main/data/gpt-4o-2024-08-06-answers.jsonl), and I ran again the accuracy remains 0.599. Could you please check and see...