Error using single gpu for training
Thanks for the work you have done.
I encounter the following error using the single GPU training,
ValueError:num_samples should be a positive integer value, but got num_samples=-67108864
Command I am using is; python train.py --rank 0 --gpu 0
Can you please assist?
Thanks
Hello @Adnan-Khan7 , could you let me know the details errors such as traceback and the code line ?
sure, please have a look at the traceback
Traceback (most recent call last):
File "train.py", line 319, in
have you change some default arguments? because there is no logic to make the num_samples be negative.
is the same with the command python train.py --world-size 1 --rank 0 ?
I didn't change any other default arguments. Adding --world-size 1 now generates ZeroDivisionError, please see the below command that I am running
python train.py --world-size 1 --rank 0 --overwrite
train.py:40: UserWarning: You have chosen to seed training. This will turn on the CUDNN deterministic setting, which can slow down your training considerably! You may see unexpected behavior when restarting from checkpoints.
warnings.warn('You have chosen to seed training. '
Traceback (most recent call last):
File "train.py", line 319, in
by adding --gpu 0 python train.py --world-size 1 --rank 0 --gpu 0 --overwrite generates same error, but with different warning
train.py:40: UserWarning: You have chosen to seed training. This will turn on the CUDNN deterministic setting, which can slow down your training considerably! You may see unexpected behavior when restarting from checkpoints.
warnings.warn('You have chosen to seed training. '
train.py:47: UserWarning: You have chosen a specific GPU. This will completely disable data parallelism.
warnings.warn('You have chosen a specific GPU. This will completely '
Traceback (most recent call last):
File "train.py", line 319, in
Dear Lee, any comments on the above-stated error?