bigcodebench icon indicating copy to clipboard operation
bigcodebench copied to clipboard

What is the prompt format?

Open ccocks opened this issue 9 months ago • 9 comments

I have an OpenAI compatible endpoint that I'm prepping up for evaluation, and I want to know what is the FINAL end request being sent? e.g: (I'm looking at the HF dataset https://huggingface.co/datasets/bigcode/bigcodebench-hard/viewer/default/v0.1.0_hf?row=0&views%5B%5D=v010_hf)

[
{
"role":"user"
"content": "{complete_prompt if complete else instruct_prompt}"
},
{
"role":"assistant"
"content": "Sure, I can do that! Blah Blah Blah: ```python {canonical_solution} ``` This function does this and that..."
}
]

Basically what format should I expect and what should I send back?

ccocks avatar Apr 01 '25 16:04 ccocks

The OpenAI endpoint will receive the user prompts and output the response, as described in the code.

Regarding the prompt context, please refer to this block.

I have an OpenAI compatible endpoint that I'm prepping up for evaluation

You should be able to run the evaluation without changing the current implementation. Passing a base_url arg should be enough. If you need any other customization, please check out this doc.

terryyz avatar Apr 01 '25 17:04 terryyz

Thank you. Referring to that block in utility.py, It is a great start, but I'd like to know it in terms of the dataset columns. I want to know the column mapping from the code to the dataset (https://huggingface.co/datasets/bigcode/bigcodebench-hard/viewer/default/v0.1.0_hf?row=0&views%5B%5D=v010_hf)

I'm assuming: Task_prompt -> instruction_prompt / complete_prompt Instruction_prefix -> ?

Also, I don't quite understand why there is a "response" variable. Could you please clarify that? Is that the expected output


From: Terry Yue Zhuo @.> Sent: Tuesday, April 1, 2025 10:12 AM To: bigcode-project/bigcodebench @.> Cc: blazgocompany @.>; Author @.> Subject: Re: [bigcode-project/bigcodebench] What is the prompt format? (Issue #91)

The OpenAI endpoint will receive the user prompts and output the response, as described in the codehttps://github.com/bigcode-project/bigcodebench/blob/main/bigcodebench/gen/util/openai_request.py#L29.

Regarding the prompt context, please refer to this blockhttps://github.com/bigcode-project/bigcodebench/blob/main/bigcodebench/provider/utility.py#L43-L60.

I have an OpenAI compatible endpoint that I'm prepping up for evaluation

You should be able to run the evaluation without changing the current implementation. Passing a base_url arg should be enough. If you need any other customization, please check out this dochttps://github.com/bigcode-project/bigcodebench/blob/main/ADVANCED_USAGE.md.

— Reply to this email directly, view it on GitHubhttps://github.com/bigcode-project/bigcodebench/issues/91#issuecomment-2770061248, or unsubscribehttps://github.com/notifications/unsubscribe-auth/AXA4YJMLO5E2ZKETYB4RVKT2XLCGJAVCNFSM6AAAAAB2HMJHQ6VHI2DSMVQWIX3LMV43OSLTON2WKQ3PNVWWK3TUHMZDONZQGA3DCMRUHA. You are receiving this because you authored the thread.Message ID: @.***>

[terryyz]terryyz left a comment (bigcode-project/bigcodebench#91)https://github.com/bigcode-project/bigcodebench/issues/91#issuecomment-2770061248

The OpenAI endpoint will receive the user prompts and output the response, as described in the codehttps://github.com/bigcode-project/bigcodebench/blob/main/bigcodebench/gen/util/openai_request.py#L29.

Regarding the prompt context, please refer to this blockhttps://github.com/bigcode-project/bigcodebench/blob/main/bigcodebench/provider/utility.py#L43-L60.

I have an OpenAI compatible endpoint that I'm prepping up for evaluation

You should be able to run the evaluation without changing the current implementation. Passing a base_url arg should be enough. If you need any other customization, please check out this dochttps://github.com/bigcode-project/bigcodebench/blob/main/ADVANCED_USAGE.md.

— Reply to this email directly, view it on GitHubhttps://github.com/bigcode-project/bigcodebench/issues/91#issuecomment-2770061248, or unsubscribehttps://github.com/notifications/unsubscribe-auth/AXA4YJMLO5E2ZKETYB4RVKT2XLCGJAVCNFSM6AAAAAB2HMJHQ6VHI2DSMVQWIX3LMV43OSLTON2WKQ3PNVWWK3TUHMZDONZQGA3DCMRUHA. You are receiving this because you authored the thread.Message ID: @.***>

ccocks avatar Apr 01 '25 19:04 ccocks

Instruction_prefix -> these two variables

The "response" variable is used for profiling, the same design as inside EvalPlus. It won't be used by any model APIs.

terryyz avatar Apr 02 '25 02:04 terryyz

@terryyz While evaluating, I'm getting issues with eval (0.00 pass@1), and:

AttributeError: module 'os' has no attribute 'killpg'

Is there a way I can debug this?

ccocks avatar Apr 07 '25 20:04 ccocks

The pointer of Instruction_prefix has been fixed.

Is there a way I can debug this?

I think you may use this on Windows, where you are supposed to run inside a Linux env. You may find this helpful.

terryyz avatar Apr 09 '25 14:04 terryyz

@terryyz Ok, but why is it giving 0 pass@1, In fact inorder to test it i made my code retrive the hf dataset (bcb_hard) and if the input matches one of the prompts in the dataset it replies with the last codeblock in the input (because it says write code starting with...) Plus the canonical_solution in the dataset. When tested manually it is fine but it fails here

ccocks avatar Apr 10 '25 19:04 ccocks

AttributeError: module 'os' has no attribute 'killpg' is a part of the evaluation logic. If you fail on the os.killpg`, you basically cannot run the evaluation correctly. You have to make sure that you are running inside a correct environment first.

terryyz avatar Apr 11 '25 13:04 terryyz

@terryyz No module named "pytesseract", "lxml", "sklearn", etc, etc

I'm using Linux now (Jupyter notebook on HF Spaces) and getting low pass rate because of all theses import errors. Is my model supposed to respond with a dynamically importing solution? But when I see the santized prompts for o3-mini-high, I see it never does that. It just imports normally. So do I need to import them myself before evaluating? or is the model supposed to use subprocess.run('pip install module')?

ccocks avatar Apr 16 '25 20:04 ccocks

Sorry, missed this.

I'm not sure why you are evaluating without using any of the provided execution environment..? For example, you can freely use the Gradio Endpoint on the Hugging Face Space we set up. In addition, you can access the Docker files for the code execution.

terryyz avatar May 17 '25 22:05 terryyz