lighteval icon indicating copy to clipboard operation
lighteval copied to clipboard

How to print all pass@k scores when generating 16 samples?

Open passing2961 opened this issue 4 months ago • 10 comments

Hi,

I want to print all results of pass@k metrics when generating 16 samples. (e.g., k=1, 2, 4, 8, 16)


math_500_pass_k_at_16 = LightevalTaskConfig(
    name="math_500_pass_k_at_16",
    suite=["custom"],
    prompt_function=math_500_prompt_fn,
    hf_repo="HuggingFaceH4/MATH-500",
    hf_subset="default",
    hf_avail_splits=["test"],
    evaluation_splits=["test"],
    few_shots_split=None,
    few_shots_select=None,
    generation_size=32768,
    metrics=[
        Metrics.pass_at_k_math(sample_params={"k": 1, "n": 16}),
        Metrics.pass_at_k_math(sample_params={"k": 2, "n": 16}),
        Metrics.pass_at_k_math(sample_params={"k": 4, "n": 16}),
        Metrics.pass_at_k_math(sample_params={"k": 8, "n": 16}),
        Metrics.pass_at_k_math(sample_params={"k": 16, "n": 16}),
    ],
    version=2,

But, I can't see full results that I want. Does anyone know how to resolve it?

passing2961 avatar Sep 29 '25 21:09 passing2961

Can you share the results table you are getting? What you are doing looks good.

On Mon, Sep 29, 2025 at 11:50 PM Young-Jun Lee @.***> wrote:

passing2961 created an issue (huggingface/lighteval#999) https://github.com/huggingface/lighteval/issues/999

Hi,

I want to print all results of @.*** metrics when generating 16 samples. (e.g., k=1, 2, 4, 8, 16)

math_500_pass_k_at_16 = LightevalTaskConfig( name="math_500_pass_k_at_16", suite=["custom"], prompt_function=math_500_prompt_fn, hf_repo="HuggingFaceH4/MATH-500", hf_subset="default", hf_avail_splits=["test"], evaluation_splits=["test"], few_shots_split=None, few_shots_select=None, generation_size=32768, metrics=[ Metrics.pass_at_k_math(sample_params={"k": 1, "n": 16}), Metrics.pass_at_k_math(sample_params={"k": 2, "n": 16}), Metrics.pass_at_k_math(sample_params={"k": 4, "n": 16}), Metrics.pass_at_k_math(sample_params={"k": 8, "n": 16}), Metrics.pass_at_k_math(sample_params={"k": 16, "n": 16}), ], version=2,

But, I can't see full results that I want. Does anyone know how to resolve it?

— Reply to this email directly, view it on GitHub https://github.com/huggingface/lighteval/issues/999, or unsubscribe https://github.com/notifications/unsubscribe-auth/AFNMROFKG7XD2LN66JTPELD3VGSRBAVCNFSM6AAAAACH2VDQGOVHI2DSMVQWIX3LMV43ASLTON2WKOZTGQ3DMNZTHA4TMNI . You are receiving this because you are subscribed to this thread.Message ID: @.***>

clefourrier avatar Sep 30 '25 09:09 clefourrier

@clefourrier Yes, sure.

Below is the code


math_500_pass_k_at_4 = LightevalTaskConfig(
    name="math_500_pass_k_at_4",
    suite=["custom"],
    prompt_function=math_500_prompt_fn,
    hf_repo="HuggingFaceH4/MATH-500",
    hf_subset="default",
    hf_avail_splits=["test"],
    evaluation_splits=["test"],
    few_shots_split=None,
    few_shots_select=None,
    generation_size=32768,
    metrics=[
        Metrics.pass_at_k_math(sample_params={"k": 1, "n": 4}),
        Metrics.pass_at_k_math(sample_params={"k": 2, "n": 4}),
        Metrics.pass_at_k_math(sample_params={"k": 4, "n": 4}),
    ],
    version=2,
)

The printed results of Qwen2.5-0.5B-Instruct are shown in the below figure.

Image

passing2961 avatar Sep 30 '25 15:09 passing2961

cool

tomtyiu avatar Sep 30 '25 18:09 tomtyiu

@tomtyiu I still couldn't see all the results I need - specifically pass@1, pass@2, and pass@4. I'm planning to create a trend plot showing performance across different k values. Do you how to solve?

passing2961 avatar Sep 30 '25 18:09 passing2961

Oh very interesting! I suspect the metric name is being overwritten by the last average computed - can you check with the latest version of lighteval on main and tell me what you get? I should have fixed this a couple PRs ago

clefourrier avatar Sep 30 '25 21:09 clefourrier

@clefourrier Even after installing the latest version of lighteval and running the same code, I'm still getting the identical results shown in the figure.

Image

I'm wondering if this might be related to lighteval's caching mechanism. From my understanding, lighteval implements caching somewhere in the code, which could cause multiple runs to produce identical results. Is it possible that the results shown in the figure are the same as previous ones due to this caching issue? If not, there might be a problem with the current pass@k metric implementation (though I'm not certain about this).

passing2961 avatar Sep 30 '25 21:09 passing2961

I suspect it's actually a problem of metric name here - you should be getting pass@k_with_k=K&n=N for each K and N you are covering, I thought I had fixed that - cc @NathanHB for viz

clefourrier avatar Oct 01 '25 06:10 clefourrier

@clefourrier Then, how should I handle this? How to specify K and N, except the below way that I used?

math_500_pass_k_at_4 = LightevalTaskConfig(
    name="math_500_pass_k_at_4",
    suite=["custom"],
    prompt_function=math_500_prompt_fn,
    hf_repo="HuggingFaceH4/MATH-500",
    hf_subset="default",
    hf_avail_splits=["test"],
    evaluation_splits=["test"],
    few_shots_split=None,
    few_shots_select=None,
    generation_size=32768,
    metrics=[
        Metrics.pass_at_k_math(sample_params={"k": 1, "n": 4}),
        Metrics.pass_at_k_math(sample_params={"k": 2, "n": 4}),
        Metrics.pass_at_k_math(sample_params={"k": 4, "n": 4}),
    ],
    version=2,
)

passing2961 avatar Oct 01 '25 07:10 passing2961

Sorry let me rephrase: it seems like it is a bug! Good catch for reporting it.

The evaluation team is at a conference this week so we are monitoring discussions but we don't have the bandwidth to actually fix stuff yet, I'll take a look at this issue early next week. Feel free to re-ping me here Wed if I did not update :)

clefourrier avatar Oct 02 '25 22:10 clefourrier

Sorry it took a while, I made a fix here: https://github.com/huggingface/lighteval/pull/1017 Can you check if it's working for you?

clefourrier avatar Oct 14 '25 08:10 clefourrier