lighteval icon indicating copy to clipboard operation
lighteval copied to clipboard

[BUG] git clone issues due to filenames with special characters

Open amosyou opened this issue 3 months ago • 7 comments

Describe the bug

When cloning the repo, I'm encountering some issues where certain tests files are unable to be created. Here's the error trace.

Cloning into 'lighteval'...                                                                                                                       
Warning: Permanently added 'github.com' (ED25519) to the list of known hosts.                                                                     
remote: Enumerating objects: 6529, done.                                                                                                          
remote: Counting objects: 100% (2557/2557), done.                                                                                                 
remote: Compressing objects: 100% (1076/1076), done.                                                                                              
remote: Total 6529 (delta 1983), reused 1481 (delta 1481), pack-reused 3972 (from 3)                                                              
Receiving objects: 100% (6529/6529), 3.07 MiB | 8.07 MiB/s, done.                                                                                 
Resolving deltas: 100% (4034/4034), done.                                                                                                         
Updating files: 100% (533/533), done.                                                                                                             
error: unable to create file tests/reference_details/SmolLM2-1.7B-Instruct-vllm/details_agieval:lsat-rc|0_2025-11-05T14-52-08.352779.parquet: Inva
lid argument                                                                                                                                      
error: unable to create file tests/reference_details/SmolLM2-1.7B-Instruct-transformers/details_arc:challenge|25_2025-11-05T14-43-47.148527.parque
t: Invalid argument                                                                                                                               
error: unable to create file tests/reference_details/SmolLM2-1.7B-Instruct-vllm/details_arc:challenge|25_2025-11-05T14-52-08.352779.parquet: Inval
id argument                                                                                                                                       
error: unable to create file tests/reference_details/SmolLM2-1.7B-Instruct-vllm/details_hellaswag|10_2025-11-05T14-52-08.352779.parquet: Invalid a
rgument                                                                                                                                           
error: unable to create file tests/reference_details/SmolLM2-1.7B-Instruct-vllm/details_agieval:sat-en|0_2025-11-05T14-52-08.352779.parquet: Inval
id argument
...
error: unable to create file tests/reference_details/SmolLM2-1.7B-Instruct-transformers/details_agieval:aqua-rat|0_2025-11-05T14-43-47.148527.parq
uet: Invalid argument
Filtering content: 100% (61/61), 65.52 MiB | 21.98 MiB/s, done.
fatal: unable to checkout working tree
warning: Clone succeeded, but checkout failed.
You can inspect what was checked out with 'git status'
and retry with 'git restore --source=HEAD :/'

I believe this is due to the filenames with contains special characters like "|" and ":". For example, running this example also gives a similar error.

mkdir "example_|"
mkdir: cannot create directory ‘example_|’: Invalid argument
mkdir "example_:"
mkdir: cannot create directory ‘example_:’: Invalid argument

To Reproduce

git clone [email protected]:huggingface/lighteval.git

Expected behavior

git clone should work.

Version info

Linux (amd64)

amosyou avatar Nov 13 '25 07:11 amosyou

hey ! thanks for raising the issue. weird that this happens on linux, do you have git lfs installed ? it might be why it's not working.

NathanHB avatar Nov 13 '25 13:11 NathanHB

hi! I do have git-lfs installed. Aren't the tests/reference_details files committed with git-lfs?

amosyou avatar Nov 13 '25 16:11 amosyou

I'm only trying to download one of the files here, and it's showing the same error. Would it not be feasible to change the names of these files? In general it's just bad practice to have files with | or : in them

wget https://github.com/huggingface/lighteval/raw/refs/heads/main/tests/referenc
e_details/Qwen2.5-VL-3B-Instruct-vlm/details_mmmu_pro:standard-4%7C0_2025-11-05T15-23-34.026089.parquet                                           
--2025-11-13 18:15:55--  https://github.com/huggingface/lighteval/raw/refs/heads/main/tests/reference_details/Qwen2.5-VL-3B-Instruct-vlm/details_m
mmu_pro:standard-4%7C0_2025-11-05T15-23-34.026089.parquet                                                                                         
Resolving github.com (github.com)... 140.82.116.4                                                                                                 
Connecting to github.com (github.com)|140.82.116.4|:443... connected.                                                                             
HTTP request sent, awaiting response... 302 Found                                                                                                 
Location: https://media.githubusercontent.com/media/huggingface/lighteval/refs/heads/main/tests/reference_details/Qwen2.5-VL-3B-Instruct-vlm/detai
ls_mmmu_pro%3Astandard-4%7C0_2025-11-05T15-23-34.026089.parquet [following]                                                                       
--2025-11-13 18:15:55--  https://media.githubusercontent.com/media/huggingface/lighteval/refs/heads/main/tests/reference_details/Qwen2.5-VL-3B-Ins
truct-vlm/details_mmmu_pro%3Astandard-4%7C0_2025-11-05T15-23-34.026089.parquet                                                                    
Resolving media.githubusercontent.com (media.githubusercontent.com)... 185.199.111.133, 185.199.110.133, 185.199.108.133, ...                     
Connecting to media.githubusercontent.com (media.githubusercontent.com)|185.199.111.133|:443... connected.                                        
HTTP request sent, awaiting response... 200 OK
Length: 11538690 (11M) [application/octet-stream]
details_mmmu_pro:standard-4|0_2025-11-05T15-23-34.026089.parquet: Invalid argument

Cannot write to ‘details_mmmu_pro:standard-4|0_2025-11-05T15-23-34.026089.parquet’ (Success).

amosyou avatar Nov 13 '25 18:11 amosyou

It is ! I was just trying to find out why it does not work for you

I will open a PR to rename those and tag you to try it out :)

NathanHB avatar Nov 14 '25 11:11 NathanHB

thanks! yeah in my setup im using nfs for storage so special characters are not allowed.

i tried running some evals and caching fails due to the task name being {task_name}|{num_fewshots} in lighteval_task.py and registry.py. changed it to {task_name}-{num_fewshots} which seems to solve the problem. is there other logic somewhere that splits on the | as a delimiter?

amosyou avatar Nov 14 '25 22:11 amosyou

hey ! Can you try this and see if thise works for you ? :) #1062

NathanHB avatar Nov 17 '25 10:11 NathanHB

was able to git checkout that branch. thanks!

amosyou avatar Nov 18 '25 01:11 amosyou