when i generate traindata python -m eagle.ge_data.allocation --outdir [path of data],but i meet the error
![Uploading 1721797678355.png…]()
Traceback (most recent call last):
File "/root/autodl-tmp/EAGLE/eagle/ge_data/ge_data_all_vicuna.py", line 148, in
ds = build_dataset_rank(bigtokenizer)
File "/root/autodl-tmp/EAGLE/eagle/ge_data/ge_data_all_vicuna.py", line 130, in build_dataset_rank
ds1 = ds1.map(
File "/root/miniconda3/lib/python3.8/site-packages/datasets/arrow_dataset.py", line 602, in wrapper
out: Union["Dataset", "DatasetDict"] = func(self, *args, **kwargs)
File "/root/miniconda3/lib/python3.8/site-packages/datasets/arrow_dataset.py", line 567, in wrapper
out: Union["Dataset", "DatasetDict"] = func(self, *args, **kwargs)
File "/root/miniconda3/lib/python3.8/site-packages/datasets/arrow_dataset.py", line 3253, in map
for rank, done, content in iflatmap_unordered(
File "/root/miniconda3/lib/python3.8/site-packages/datasets/utils/py_utils.py", line 718, in iflatmap_unordered
[async_result.get(timeout=0.05) for async_result in async_results]
File "/root/miniconda3/lib/python3.8/site-packages/datasets/utils/py_utils.py", line 718, in
[async_result.get(timeout=0.05) for async_result in async_results]
File "/root/miniconda3/lib/python3.8/site-packages/multiprocess/pool.py", line 771, in get
raise self._value
IndexError: list index out of range
There might be an issue with parallel processing. You can try using num_proc=1 to see if it helps.