Token indices sequence length is longer than the specified maximum sequence length for this model (4158 > 2048)
Describe the bug Running the Pythia-7B fine-tune script on 4 x A10 (24GB each).
Seems like issue with seq len:
_```
Token indices sequence length is longer than the specified maximum sequence length for this model (4158 > 2048). Running this sequence through the model will result in indexing errors
Traceback (most recent call last):
File "/home/ec2-user/OpenChatKit/training/dist_clm_train.py", line 358, in
**To Reproduce**
Steps to reproduce the behavior:
Run Pythia train script with following modifications:
`--num-layers 16 --embedding-dim 4096 \
--world-size 4 --pipeline-group-size 2 --data-group-size 2 \`
**Expected behavior**
Training should work?
Using standard AWS deep learning AMI with cuda
This problem also occurred when I reproduced the 20B model
Hi , did you find a solution ? I am facing the same problem. I am trying to test alpa for distributed parallel training. NVIDIA-SMI 470.182.03 Driver Version: 470.182.03 CUDA Version: 11.4
nvcc -V nvcc: NVIDIA (R) Cuda compiler driver Copyright (c) 2005-2021 NVIDIA Corporation Built on Sun_Feb_14_21:12:58_PST_2021 Cuda compilation tools, release 11.2, V11.2.152 Build cuda_11.2.r11.2/compiler.29618528_0
When I run python3 -m alpa.test_install:
File "cupy_backends/cuda/libs/nccl.pyx", line 283, in cupy_backends.cuda.libs.nccl.NcclCommunicator.init
File "cupy_backends/cuda/libs/nccl.pyx", line 129, in cupy_backends.cuda.libs.nccl.check_status
cupy_backends.cuda.libs.nccl.NcclError: NCCL_ERROR_UNHANDLED_CUDA_ERROR: unhandled cuda error
Any help would be really appreciated. I tried different versions of cuda , same error every time..
also same error...