Huzhen comments

Results 11 comments of


                                            Huzhen

How to modify the identifier of GPU and the number of GPU to train the model?

I added statements：os.environ['CUDA_VISIBLE_DEVICES'] = '0, 1' in the train_net script。When performing train_net script training, Report an error： Default process group is not initialized How to solve it? And the default...

How to modify the identifier of GPU and the number of GPU to train the model?

oh，I know that I need to modify the IMS_PER_BATCH and IMS_PER_DEVICE parameter in the config script to change its batch_size. But, for the training of two 3090 graphics cards, I...

How to modify the identifier of GPU and the number of GPU to train the model?

I have now modified the corresponding parameters in the config script, but run train_ net script still reports an error: Default process group is not initialized

How to modify the identifier of GPU and the number of GPU to train the model?

Traceback (most recent call last): File "train_net.py", line 106, in launch( File "/media/data/huzhen/YOLOF-torch/cvpods/engine/launch.py", line 56, in launch main_func(*args) File "train_net.py", line 96, in main runner.train() File "/media/data/huzhen/YOLOF-torch/cvpods/engine/runner.py", line 270, in...

How to modify the identifier of GPU and the number of GPU to train the model?

I am using the train_net script under tools folder for training, Some parameters in the config script are adjusted, including IMS_PER_BATCH， IMS_PER_DEVICE, WARMUP_FACTOR and WARMUP_ITERS parameters。And add extra statement in...

How to modify the identifier of GPU and the number of GPU to train the model?

Now there is a new error in the 'dist URL' parameter: cvpods.engine.launch ERROR: Process group URL: tcp://127.0.0.1:50147 RuntimeError: Address already in use ai...Your code actually is too hard to run。。。。

How to modify the identifier of GPU and the number of GPU to train the model?

Using the method in REDEME to train, it can only modify the number of GPUs, but it definitely can't update the identifier of GPU to train at all.

How to modify the identifier of GPU and the number of GPU to train the model?

Ok，I konw. Take 2 GPUs for training , it still report error : assert base_world_size == 8, "IMS_PER_BATCH/DEVICE in config file is used for 8 GPUs" AssertionError: IMS_PER_BATCH/DEVICE in config...

How to modify the identifier of GPU and the number of GPU to train the model?

I useing 4 GPUs for training with the way you provided， like this： CUDA_VISIBLE_DEVICES=0，1，2，3 pods_train --num-gpus 4 But it still report a error ： RuntimeError: NCCL error in: /opt/conda/conda-bld/pytorch_1607370156314/work/torch/lib/c10d/ProcessGroupNCCL.cpp:784, invalid...

How to modify the identifier of GPU and the number of GPU to train the model?

OK，I trying to see if I could work it out. Thanks ！