How to print all pass@k scores when generating 16 samples?
Hi,
I want to print all results of pass@k metrics when generating 16 samples. (e.g., k=1, 2, 4, 8, 16)
math_500_pass_k_at_16 = LightevalTaskConfig(
name="math_500_pass_k_at_16",
suite=["custom"],
prompt_function=math_500_prompt_fn,
hf_repo="HuggingFaceH4/MATH-500",
hf_subset="default",
hf_avail_splits=["test"],
evaluation_splits=["test"],
few_shots_split=None,
few_shots_select=None,
generation_size=32768,
metrics=[
Metrics.pass_at_k_math(sample_params={"k": 1, "n": 16}),
Metrics.pass_at_k_math(sample_params={"k": 2, "n": 16}),
Metrics.pass_at_k_math(sample_params={"k": 4, "n": 16}),
Metrics.pass_at_k_math(sample_params={"k": 8, "n": 16}),
Metrics.pass_at_k_math(sample_params={"k": 16, "n": 16}),
],
version=2,
But, I can't see full results that I want. Does anyone know how to resolve it?
Can you share the results table you are getting? What you are doing looks good.
On Mon, Sep 29, 2025 at 11:50 PM Young-Jun Lee @.***> wrote:
passing2961 created an issue (huggingface/lighteval#999) https://github.com/huggingface/lighteval/issues/999
Hi,
I want to print all results of @.*** metrics when generating 16 samples. (e.g., k=1, 2, 4, 8, 16)
math_500_pass_k_at_16 = LightevalTaskConfig( name="math_500_pass_k_at_16", suite=["custom"], prompt_function=math_500_prompt_fn, hf_repo="HuggingFaceH4/MATH-500", hf_subset="default", hf_avail_splits=["test"], evaluation_splits=["test"], few_shots_split=None, few_shots_select=None, generation_size=32768, metrics=[ Metrics.pass_at_k_math(sample_params={"k": 1, "n": 16}), Metrics.pass_at_k_math(sample_params={"k": 2, "n": 16}), Metrics.pass_at_k_math(sample_params={"k": 4, "n": 16}), Metrics.pass_at_k_math(sample_params={"k": 8, "n": 16}), Metrics.pass_at_k_math(sample_params={"k": 16, "n": 16}), ], version=2,
But, I can't see full results that I want. Does anyone know how to resolve it?
— Reply to this email directly, view it on GitHub https://github.com/huggingface/lighteval/issues/999, or unsubscribe https://github.com/notifications/unsubscribe-auth/AFNMROFKG7XD2LN66JTPELD3VGSRBAVCNFSM6AAAAACH2VDQGOVHI2DSMVQWIX3LMV43ASLTON2WKOZTGQ3DMNZTHA4TMNI . You are receiving this because you are subscribed to this thread.Message ID: @.***>
@clefourrier Yes, sure.
Below is the code
math_500_pass_k_at_4 = LightevalTaskConfig(
name="math_500_pass_k_at_4",
suite=["custom"],
prompt_function=math_500_prompt_fn,
hf_repo="HuggingFaceH4/MATH-500",
hf_subset="default",
hf_avail_splits=["test"],
evaluation_splits=["test"],
few_shots_split=None,
few_shots_select=None,
generation_size=32768,
metrics=[
Metrics.pass_at_k_math(sample_params={"k": 1, "n": 4}),
Metrics.pass_at_k_math(sample_params={"k": 2, "n": 4}),
Metrics.pass_at_k_math(sample_params={"k": 4, "n": 4}),
],
version=2,
)
The printed results of Qwen2.5-0.5B-Instruct are shown in the below figure.
cool
@tomtyiu I still couldn't see all the results I need - specifically pass@1, pass@2, and pass@4. I'm planning to create a trend plot showing performance across different k values. Do you how to solve?
Oh very interesting! I suspect the metric name is being overwritten by the last average computed - can you check with the latest version of lighteval on main and tell me what you get? I should have fixed this a couple PRs ago
@clefourrier Even after installing the latest version of lighteval and running the same code, I'm still getting the identical results shown in the figure.
I'm wondering if this might be related to lighteval's caching mechanism. From my understanding, lighteval implements caching somewhere in the code, which could cause multiple runs to produce identical results. Is it possible that the results shown in the figure are the same as previous ones due to this caching issue? If not, there might be a problem with the current pass@k metric implementation (though I'm not certain about this).
I suspect it's actually a problem of metric name here - you should be getting pass@k_with_k=K&n=N for each K and N you are covering, I thought I had fixed that - cc @NathanHB for viz
@clefourrier Then, how should I handle this? How to specify K and N, except the below way that I used?
math_500_pass_k_at_4 = LightevalTaskConfig(
name="math_500_pass_k_at_4",
suite=["custom"],
prompt_function=math_500_prompt_fn,
hf_repo="HuggingFaceH4/MATH-500",
hf_subset="default",
hf_avail_splits=["test"],
evaluation_splits=["test"],
few_shots_split=None,
few_shots_select=None,
generation_size=32768,
metrics=[
Metrics.pass_at_k_math(sample_params={"k": 1, "n": 4}),
Metrics.pass_at_k_math(sample_params={"k": 2, "n": 4}),
Metrics.pass_at_k_math(sample_params={"k": 4, "n": 4}),
],
version=2,
)
Sorry let me rephrase: it seems like it is a bug! Good catch for reporting it.
The evaluation team is at a conference this week so we are monitoring discussions but we don't have the bandwidth to actually fix stuff yet, I'll take a look at this issue early next week. Feel free to re-ping me here Wed if I did not update :)
Sorry it took a while, I made a fix here: https://github.com/huggingface/lighteval/pull/1017 Can you check if it's working for you?