Results 11 comments of Huzhen

I added statements:os.environ['CUDA_VISIBLE_DEVICES'] = '0, 1' in the train_net script。When performing train_net script training, Report an error: Default process group is not initialized How to solve it? And the default...

oh,I know that I need to modify the IMS_PER_BATCH and IMS_PER_DEVICE parameter in the config script to change its batch_size. But, for the training of two 3090 graphics cards, I...

I have now modified the corresponding parameters in the config script, but run train_ net script still reports an error: Default process group is not initialized

Traceback (most recent call last): File "train_net.py", line 106, in launch( File "/media/data/huzhen/YOLOF-torch/cvpods/engine/launch.py", line 56, in launch main_func(*args) File "train_net.py", line 96, in main runner.train() File "/media/data/huzhen/YOLOF-torch/cvpods/engine/runner.py", line 270, in...

I am using the train_net script under tools folder for training, Some parameters in the config script are adjusted, including IMS_PER_BATCH, IMS_PER_DEVICE, WARMUP_FACTOR and WARMUP_ITERS parameters。And add extra statement in...

Now there is a new error in the 'dist URL' parameter: cvpods.engine.launch ERROR: Process group URL: tcp://127.0.0.1:50147 RuntimeError: Address already in use ai...Your code actually is too hard to run。。。。

Using the method in REDEME to train, it can only modify the number of GPUs, but it definitely can't update the identifier of GPU to train at all.

Ok,I konw. Take 2 GPUs for training , it still report error : assert base_world_size == 8, "IMS_PER_BATCH/DEVICE in config file is used for 8 GPUs" AssertionError: IMS_PER_BATCH/DEVICE in config...

I useing 4 GPUs for training with the way you provided, like this: CUDA_VISIBLE_DEVICES=0,1,2,3 pods_train --num-gpus 4 But it still report a error : RuntimeError: NCCL error in: /opt/conda/conda-bld/pytorch_1607370156314/work/torch/lib/c10d/ProcessGroupNCCL.cpp:784, invalid...

OK,I trying to see if I could work it out. Thanks !