devit icon indicating copy to clipboard operation
devit copied to clipboard

Memory overflow error in coco evaluation, but I used 8 3090 graphics cards and did not encounter it when evaluating the lvis dataset

Open study-hard-forever opened this issue 2 years ago • 6 comments

  1. I encountered torch.cuda.OutOfMemoryError: CUDA out of memory error, but I used 8 3090 graphics cards and did not encounter it when evaluating the lvis dataset, I tried to reduce the batch_size to 8 and even replaced it with vit_b/s but still did not get it solved; in which The configuration file configs/open-vocabulary/coco/vits.yaml for vit_s has a file naming error, CLASS_PROTOTYPES: "weights/initial/open-vocabulary/prototypes/coco/class_ prototypes_all.vitl14.pth" should be CLASS_PROTOTYPES: "weights/initial/open-vocabulary/prototypes/coco/class_prototypes_all.vits14.pth".
  2. The second problem is that I am encountering a lot of warning messages (although this may not affect normal operation), but I noticed the warning message "xFormers not available", which I don't really understand exactly what it means, so please can you give an explanation. Although I tried to install other versions of xFormers such as xformers==0.0.18 and xformers==0.0.20, they rely on torch2, which can conflict with the detectron2 framework, as described in the official detectron2 configuration: https://github.com/ facebookresearch/detectron2/releases This did not go so well as you mentioned.

Note: I followed the configuration exactly as given in your tutorial:

git clone https://github.com/mlzxy/devit.git conda create -n devit python=3.9 conda activate devit pip install -r devit/requirements.txt pip install -e ./devit

but still encountered the problem mentioned in the first question.

Looking forward to your reply, thanks!

command: vit=b task=ovd dataset=coco bash scripts/eval.sh

The logs are as follows:

task=ovd, vit=b, dataset=coco, shot=10, split=1, num_gpus=8 xFormers not available Command Line Args: Namespace(config_file='configs/open-vocabulary/coco/vitb.yaml', resume=False, eval_only=True, num_gpus=8, num_machines=1, machine_rank=0, dist_url='auto', opts=['MODEL.WEIGHTS', 'weights/trained/open-vocabulary/coco/vitb_0079999.pth', 'DE.OFFLINE_RPN_CONFIG', 'configs/RPN/mask_rcnn_R_50_C4_1x_ovd_FSD.yaml', 'OUTPUT_DIR', 'output/eval/open-vocabulary/coco/vitb/']) xFormers not available xFormers not available xFormers not available xFormers not available xFormers not available xFormers not available xFormers not available xFormers not available [10/31 14:52:24 detectron2]: Rank of current process: 0. World size: 8 [10/31 14:52:25 detectron2]: Environment info:


sys.platform linux Python 3.9.18 (main, Sep 11 2023, 13:41:44) [GCC 11.2.0] numpy 1.22.4 detectron2 RegionCLIP @/xxx/devit/tools/../detectron2 Compiler GCC 9.4 CUDA compiler CUDA 11.8 detectron2 arch flags 8.6 DETECTRON2_ENV_MODULE PyTorch 1.13.1+cu117 @/xxx/anaconda3/envs/devit/lib/python3.9/site-packages/torch PyTorch debug build False GPU available True GPU 0,1,2,3,4,5,6,7 NVIDIA GeForce RTX 3090 (arch=8.6) CUDA_HOME /usr/local/cuda-11.8 Pillow 9.5.0 torchvision 0.14.1+cu117 @/xxx/anaconda3/envs/devit/lib/python3.9/site-packages/torchvision torchvision arch flags 3.5, 5.0, 6.0, 7.0, 7.5, 8.0, 8.6 fvcore 0.1.5.post20221221 iopath 0.1.8 cv2 4.8.1


PyTorch built with:

  • GCC 9.3
  • C++ Version: 201402
  • Intel(R) Math Kernel Library Version 2020.0.0 Product Build 20191122 for Intel(R) 64 architecture applications
  • Intel(R) MKL-DNN v2.6.0 (Git Hash 52b5f107dd9cf10910aaa19cb47f3abf9b349815)
  • OpenMP 201511 (a.k.a. OpenMP 4.5)
  • LAPACK is enabled (usually provided by MKL)
  • NNPACK is enabled
  • CPU capability usage: AVX2
  • CUDA Runtime 11.7
  • NVCC architecture flags: -gencode;arch=compute_37,code=sm_37;-gencode;arch=compute_50,code=sm_50;-gencode;arch=compute_60,code=sm_60;-gencode;arch=compute_70,code=sm_70;-gencode;arch=compute_75,code=sm_75;-gencode;arch=compute_80,code=sm_80;-gencode;arch=compute_86,code=sm_86
  • CuDNN 8.5
  • Magma 2.6.1
  • Build settings: BLAS_INFO=mkl, BUILD_TYPE=Release, CUDA_VERSION=11.7, CUDNN_VERSION=8.5.0, CXX_COMPILER=/opt/rh/devtoolset-9/root/usr/bin/c++, CXX_FLAGS= -fabi-version=11 -Wno-deprecated -fvisibility-inlines-hidden -DUSE_PTHREADPOOL -fopenmp -DNDEBUG -DUSE_KINETO -DUSE_FBGEMM -DUSE_QNNPACK -DUSE_PYTORCH_QNNPACK -DUSE_XNNPACK -DSYMBOLICATE_MOBILE_DEBUG_HANDLE -DEDGE_PROFILER_USE_KINETO -O2 -fPIC -Wno-narrowing -Wall -Wextra -Werror=return-type -Werror=non-virtual-dtor -Wno-missing-field-initializers -Wno-type-limits -Wno-array-bounds -Wno-unknown-pragmas -Wunused-local-typedefs -Wno-unused-parameter -Wno-unused-function -Wno-unused-result -Wno-strict-overflow -Wno-strict-aliasing -Wno-error=deprecated-declarations -Wno-stringop-overflow -Wno-psabi -Wno-error=pedantic -Wno-error=redundant-decls -Wno-error=old-style-cast -fdiagnostics-color=always -faligned-new -Wno-unused-but-set-variable -Wno-maybe-uninitialized -fno-math-errno -fno-trapping-math -Werror=format -Werror=cast-function-type -Wno-stringop-overflow, LAPACK_INFO=mkl, PERF_WITH_AVX=1, PERF_WITH_AVX2=1, PERF_WITH_AVX512=1, TORCH_VERSION=1.13.1, USE_CUDA=ON, USE_CUDNN=ON, USE_EXCEPTION_PTR=1, USE_GFLAGS=OFF, USE_GLOG=OFF, USE_MKL=ON, USE_MKLDNN=ON, USE_MPI=OFF, USE_NCCL=ON, USE_NNPACK=ON, USE_OPENMP=ON, USE_ROCM=OFF,

[10/31 14:52:25 detectron2]: Command line arguments: Namespace(config_file='configs/open-vocabulary/coco/vitb.yaml', resume=False, eval_only=True, num_gpus=8, num_machines=1, machine_rank=0, dist_url='auto', opts=['MODEL.WEIGHTS', 'weights/trained/open-vocabulary/coco/vitb_0079999.pth', 'DE.OFFLINE_RPN_CONFIG', 'configs/RPN/mask_rcnn_R_50_C4_1x_ovd_FSD.yaml', 'OUTPUT_DIR', 'output/eval/open-vocabulary/coco/vitb/']) [10/31 14:52:25 detectron2]: Contents of args.config_file=configs/open-vocabulary/coco/vitb.yaml: BASE: "../../Base-RCNN-C4.yaml" DE: CLASS_PROTOTYPES: "weights/initial/open-vocabulary/prototypes/coco/class_prototypes_base.vitb14.pth,weights/initial/open-vocabulary/prototypes/coco/class_prototypes_novel.vitb14.pth" BG_PROTOTYPES: "weights/initial/background/background_prototypes.vitb14.pth" BG_CLS_LOSS_WEIGHT: 0.2 TOPK: 10

MODEL: META_ARCHITECTURE: "OpenSetDetectorWithExamples" BACKBONE: NAME: "build_dino_v2_vit" TYPE: "base" WEIGHTS: "" MASK_ON: False RPN: HEAD_NAME: StandardRPNHead IN_FEATURES: ["res4"] ROI_HEADS: SCORE_THRESH_TEST: 0.001 ROI_BOX_HEAD: NAME: "" NUM_FC: 0 POOLER_RESOLUTION: 7 CLS_AGNOSTIC_BBOX_REG: True PIXEL_MEAN: [0.48145466, 0.4578275, 0.40821073] PIXEL_STD: [0.26862954, 0.26130258, 0.27577711] INPUT: MIN_SIZE_TRAIN: (640, 672, 704, 736, 768, 800) DATASETS: TRAIN: ("coco_2017_ovd_b_train",) TEST: ("coco_2017_ovd_all_test",) TEST: EVAL_PERIOD: 5000 SOLVER: IMS_PER_BATCH: 8 BASE_LR: 0.002 STEPS: (60000, 80000) MAX_ITER: 90000 WARMUP_ITERS: 5000 CHECKPOINT_PERIOD: 5000

INPUT: MIN_SIZE_TRAIN_SAMPLING: choice MIN_SIZE_TRAIN: (640, 672, 704, 736, 768, 800) MAX_SIZE_TRAIN: 1333 MIN_SIZE_TEST: 800 MAX_SIZE_TEST: 1333 FORMAT: "RGB"

[10/31 14:52:25 detectron2]: Full config saved to output/eval/open-vocabulary/coco/vitb/config.yaml [10/31 14:52:25 d2.utils.env]: Using a generated random seed 26738411 ('coco_2017_ovd_all_test',) [10/31 14:52:33 fvcore.common.checkpoint]: [Checkpointer] Loading from weights/trained/open-vocabulary/coco/vitb_0079999.pth ... [10/31 14:52:34 d2.data.datasets.coco]: Loaded 4836 images in COCO format from datasets/coco/annotations/ovd_ins_val2017_all.json [10/31 14:52:34 d2.data.build]: Distribution of instances among all 65 categories:

category #instances category #instances category #instances
person 10777 bicycle 314 car 1918
motorcycle 367 airplane 143 bus 283
train 190 truck 414 boat 424
bench 411 bird 427 cat 202
dog 218 horse 272 sheep 354
cow 372 elephant 252 bear 71
zebra 266 giraffe 232 backpack 371
umbrella 407 handbag 540 tie 252
suitcase 299 frisbee 115 skis 241
snowboard 69 kite 327 skateboard 179
surfboard 267 bottle 1013 cup 895
fork 215 knife 325 spoon 253
bowl 623 banana 370 apple 236
sandwich 177 orange 285 broccoli 312
carrot 365 pizza 284 donut 328
cake 310 chair 1771 couch 261
bed 163 toilet 179 tv 288
laptop 231 mouse 106 remote 283
keyboard 153 microwave 55 oven 143
toaster 9 sink 225 refrigerator 126
book 1129 clock 267 vase 274
scissors 36 toothbrush 57
total 32721
[10/31 14:52:34 d2.data.dataset_mapper]: [DatasetMapper] Augmentations used in inference: [ResizeShortestEdge(short_edge_length=(800, 800), max_size=1333, sample_style='choice')]
[10/31 14:52:34 d2.data.common]: Serializing 4836 elements to byte tensors and concatenating them all ...
[10/31 14:52:34 d2.data.common]: Serialized dataset takes 17.62 MiB
('coco_2017_ovd_all_test',)
('coco_2017_ovd_all_test',)
('coco_2017_ovd_all_test',)
[10/31 14:52:35 d2.evaluation.evaluator]: Start inference on 605 batches
('coco_2017_ovd_all_test',)
('coco_2017_ovd_all_test',)
('coco_2017_ovd_all_test',)
('coco_2017_ovd_all_test',)
xFormers not available
xFormers not available
xFormers not available
xFormers not available
xFormers not available
xFormers not available
xFormers not available
xFormers not available
xFormers not available
xFormers not available
xFormers not available
xFormers not available
xFormers not available
xFormers not available
xFormers not available
xFormers not available
xFormers not available
xFormers not available
xFormers not available
xFormers not available
xFormers not available
xFormers not available
xFormers not available
xFormers not available
xFormers not available
xFormers not available
xFormers not available
xFormers not available
xFormers not available
xFormers not available
xFormers not available
xFormers not available
/xxx/devit/tools/../detectron2/structures/boxes.py:158: UserWarning: Creating a tensor from a list of numpy.ndarrays is extremely slow. Please consider converting the list to a single numpy.ndarray with numpy.array() before converting to a tensor. (Triggered internally at ../torch/csrc/utils/tensor_new.cpp:230.)
tensor = torch.as_tensor(tensor, dtype=torch.float32, device=device)
/xxx/devit/tools/../detectron2/structures/boxes.py:158: UserWarning: Creating a tensor from a list of numpy.ndarrays is extremely slow. Please consider converting the list to a single numpy.ndarray with numpy.array() before converting to a tensor. (Triggered internally at ../torch/csrc/utils/tensor_new.cpp:230.)
tensor = torch.as_tensor(tensor, dtype=torch.float32, device=device)
/xxx/devit/tools/../detectron2/structures/boxes.py:158: UserWarning: Creating a tensor from a list of numpy.ndarrays is extremely slow. Please consider converting the list to a single numpy.ndarray with numpy.array() before converting to a tensor. (Triggered internally at ../torch/csrc/utils/tensor_new.cpp:230.)
tensor = torch.as_tensor(tensor, dtype=torch.float32, device=device)
/xxx/devit/tools/../detectron2/structures/boxes.py:158: UserWarning: Creating a tensor from a list of numpy.ndarrays is extremely slow. Please consider converting the list to a single numpy.ndarray with numpy.array() before converting to a tensor. (Triggered internally at ../torch/csrc/utils/tensor_new.cpp:230.)
tensor = torch.as_tensor(tensor, dtype=torch.float32, device=device)
/xxx/devit/tools/../detectron2/structures/boxes.py:158: UserWarning: Creating a tensor from a list of numpy.ndarrays is extremely slow. Please consider converting the list to a single numpy.ndarray with numpy.array() before converting to a tensor. (Triggered internally at ../torch/csrc/utils/tensor_new.cpp:230.)
tensor = torch.as_tensor(tensor, dtype=torch.float32, device=device)
/xxx/devit/tools/../detectron2/structures/boxes.py:158: UserWarning: Creating a tensor from a list of numpy.ndarrays is extremely slow. Please consider converting the list to a single numpy.ndarray with numpy.array() before converting to a tensor. (Triggered internally at ../torch/csrc/utils/tensor_new.cpp:230.)
tensor = torch.as_tensor(tensor, dtype=torch.float32, device=device)
/xxx/devit/tools/../detectron2/structures/boxes.py:158: UserWarning: Creating a tensor from a list of numpy.ndarrays is extremely slow. Please consider converting the list to a single numpy.ndarray with numpy.array() before converting to a tensor. (Triggered internally at ../torch/csrc/utils/tensor_new.cpp:230.)
tensor = torch.as_tensor(tensor, dtype=torch.float32, device=device)
/xxx/devit/tools/../detectron2/structures/boxes.py:158: UserWarning: Creating a tensor from a list of numpy.ndarrays is extremely slow. Please consider converting the list to a single numpy.ndarray with numpy.array() before converting to a tensor. (Triggered internally at ../torch/csrc/utils/tensor_new.cpp:230.)
tensor = torch.as_tensor(tensor, dtype=torch.float32, device=device)
/xxx/devit/tools/../detectron2/structures/boxes.py:158: UserWarning: Creating a tensor from a list of numpy.ndarrays is extremely slow. Please consider converting the list to a single numpy.ndarray with numpy.array() before converting to a tensor. (Triggered internally at ../torch/csrc/utils/tensor_new.cpp:230.)
tensor = torch.as_tensor(tensor, dtype=torch.float32, device=device)
/xxx/devit/tools/../detectron2/structures/boxes.py:158: UserWarning: Creating a tensor from a list of numpy.ndarrays is extremely slow. Please consider converting the list to a single numpy.ndarray with numpy.array() before converting to a tensor. (Triggered internally at ../torch/csrc/utils/tensor_new.cpp:230.)
tensor = torch.as_tensor(tensor, dtype=torch.float32, device=device)
/xxx/devit/tools/../detectron2/structures/boxes.py:158: UserWarning: Creating a tensor from a list of numpy.ndarrays is extremely slow. Please consider converting the list to a single numpy.ndarray with numpy.array() before converting to a tensor. (Triggered internally at ../torch/csrc/utils/tensor_new.cpp:230.)
tensor = torch.as_tensor(tensor, dtype=torch.float32, device=device)
/xxx/devit/tools/../detectron2/structures/boxes.py:158: UserWarning: Creating a tensor from a list of numpy.ndarrays is extremely slow. Please consider converting the list to a single numpy.ndarray with numpy.array() before converting to a tensor. (Triggered internally at ../torch/csrc/utils/tensor_new.cpp:230.)
tensor = torch.as_tensor(tensor, dtype=torch.float32, device=device)
/xxx/devit/tools/../detectron2/structures/boxes.py:158: UserWarning: Creating a tensor from a list of numpy.ndarrays is extremely slow. Please consider converting the list to a single numpy.ndarray with numpy.array() before converting to a tensor. (Triggered internally at ../torch/csrc/utils/tensor_new.cpp:230.)
tensor = torch.as_tensor(tensor, dtype=torch.float32, device=device)
/xxx/devit/tools/../detectron2/structures/boxes.py:158: UserWarning: Creating a tensor from a list of numpy.ndarrays is extremely slow. Please consider converting the list to a single numpy.ndarray with numpy.array() before converting to a tensor. (Triggered internally at ../torch/csrc/utils/tensor_new.cpp:230.)
tensor = torch.as_tensor(tensor, dtype=torch.float32, device=device)
/xxx/devit/tools/../detectron2/structures/boxes.py:158: UserWarning: Creating a tensor from a list of numpy.ndarrays is extremely slow. Please consider converting the list to a single numpy.ndarray with numpy.array() before converting to a tensor. (Triggered internally at ../torch/csrc/utils/tensor_new.cpp:230.)
tensor = torch.as_tensor(tensor, dtype=torch.float32, device=device)
/xxx/devit/tools/../detectron2/structures/boxes.py:158: UserWarning: Creating a tensor from a list of numpy.ndarrays is extremely slow. Please consider converting the list to a single numpy.ndarray with numpy.array() before converting to a tensor. (Triggered internally at ../torch/csrc/utils/tensor_new.cpp:230.)
tensor = torch.as_tensor(tensor, dtype=torch.float32, device=device)
/xxx/devit/tools/../detectron2/structures/boxes.py:158: UserWarning: Creating a tensor from a list of numpy.ndarrays is extremely slow. Please consider converting the list to a single numpy.ndarray with numpy.array() before converting to a tensor. (Triggered internally at ../torch/csrc/utils/tensor_new.cpp:230.)
tensor = torch.as_tensor(tensor, dtype=torch.float32, device=device)
/xxx/devit/tools/../detectron2/structures/boxes.py:158: UserWarning: Creating a tensor from a list of numpy.ndarrays is extremely slow. Please consider converting the list to a single numpy.ndarray with numpy.array() before converting to a tensor. (Triggered internally at ../torch/csrc/utils/tensor_new.cpp:230.)
tensor = torch.as_tensor(tensor, dtype=torch.float32, device=device)
/xxx/devit/tools/../detectron2/structures/boxes.py:158: UserWarning: Creating a tensor from a list of numpy.ndarrays is extremely slow. Please consider converting the list to a single numpy.ndarray with numpy.array() before converting to a tensor. (Triggered internally at ../torch/csrc/utils/tensor_new.cpp:230.)
tensor = torch.as_tensor(tensor, dtype=torch.float32, device=device)
/xxx/devit/tools/../detectron2/structures/boxes.py:158: UserWarning: Creating a tensor from a list of numpy.ndarrays is extremely slow. Please consider converting the list to a single numpy.ndarray with numpy.array() before converting to a tensor. (Triggered internally at ../torch/csrc/utils/tensor_new.cpp:230.)
tensor = torch.as_tensor(tensor, dtype=torch.float32, device=device)
/xxx/devit/tools/../detectron2/structures/boxes.py:158: UserWarning: Creating a tensor from a list of numpy.ndarrays is extremely slow. Please consider converting the list to a single numpy.ndarray with numpy.array() before converting to a tensor. (Triggered internally at ../torch/csrc/utils/tensor_new.cpp:230.)
tensor = torch.as_tensor(tensor, dtype=torch.float32, device=device)
/xxx/devit/tools/../detectron2/structures/boxes.py:158: UserWarning: Creating a tensor from a list of numpy.ndarrays is extremely slow. Please consider converting the list to a single numpy.ndarray with numpy.array() before converting to a tensor. (Triggered internally at ../torch/csrc/utils/tensor_new.cpp:230.)
tensor = torch.as_tensor(tensor, dtype=torch.float32, device=device)
/xxx/devit/tools/../detectron2/structures/boxes.py:158: UserWarning: Creating a tensor from a list of numpy.ndarrays is extremely slow. Please consider converting the list to a single numpy.ndarray with numpy.array() before converting to a tensor. (Triggered internally at ../torch/csrc/utils/tensor_new.cpp:230.)
tensor = torch.as_tensor(tensor, dtype=torch.float32, device=device)
/xxx/devit/tools/../detectron2/structures/boxes.py:158: UserWarning: Creating a tensor from a list of numpy.ndarrays is extremely slow. Please consider converting the list to a single numpy.ndarray with numpy.array() before converting to a tensor. (Triggered internally at ../torch/csrc/utils/tensor_new.cpp:230.)
tensor = torch.as_tensor(tensor, dtype=torch.float32, device=device)
/xxx/devit/tools/../detectron2/structures/boxes.py:158: UserWarning: Creating a tensor from a list of numpy.ndarrays is extremely slow. Please consider converting the list to a single numpy.ndarray with numpy.array() before converting to a tensor. (Triggered internally at ../torch/csrc/utils/tensor_new.cpp:230.)
tensor = torch.as_tensor(tensor, dtype=torch.float32, device=device)
/xxx/devit/tools/../detectron2/structures/boxes.py:158: UserWarning: Creating a tensor from a list of numpy.ndarrays is extremely slow. Please consider converting the list to a single numpy.ndarray with numpy.array() before converting to a tensor. (Triggered internally at ../torch/csrc/utils/tensor_new.cpp:230.)
tensor = torch.as_tensor(tensor, dtype=torch.float32, device=device)
/xxx/devit/tools/../detectron2/structures/boxes.py:158: UserWarning: Creating a tensor from a list of numpy.ndarrays is extremely slow. Please consider converting the list to a single numpy.ndarray with numpy.array() before converting to a tensor. (Triggered internally at ../torch/csrc/utils/tensor_new.cpp:230.)
tensor = torch.as_tensor(tensor, dtype=torch.float32, device=device)
/xxx/devit/tools/../detectron2/structures/boxes.py:158: UserWarning: Creating a tensor from a list of numpy.ndarrays is extremely slow. Please consider converting the list to a single numpy.ndarray with numpy.array() before converting to a tensor. (Triggered internally at ../torch/csrc/utils/tensor_new.cpp:230.)
tensor = torch.as_tensor(tensor, dtype=torch.float32, device=device)
/xxx/devit/tools/../detectron2/structures/boxes.py:158: UserWarning: Creating a tensor from a list of numpy.ndarrays is extremely slow. Please consider converting the list to a single numpy.ndarray with numpy.array() before converting to a tensor. (Triggered internally at ../torch/csrc/utils/tensor_new.cpp:230.)
tensor = torch.as_tensor(tensor, dtype=torch.float32, device=device)
/xxx/devit/tools/../detectron2/structures/boxes.py:158: UserWarning: Creating a tensor from a list of numpy.ndarrays is extremely slow. Please consider converting the list to a single numpy.ndarray with numpy.array() before converting to a tensor. (Triggered internally at ../torch/csrc/utils/tensor_new.cpp:230.)
tensor = torch.as_tensor(tensor, dtype=torch.float32, device=device)
/xxx/devit/tools/../detectron2/structures/boxes.py:158: UserWarning: Creating a tensor from a list of numpy.ndarrays is extremely slow. Please consider converting the list to a single numpy.ndarray with numpy.array() before converting to a tensor. (Triggered internally at ../torch/csrc/utils/tensor_new.cpp:230.)
tensor = torch.as_tensor(tensor, dtype=torch.float32, device=device)
/xxx/devit/tools/../detectron2/structures/boxes.py:158: UserWarning: Creating a tensor from a list of numpy.ndarrays is extremely slow. Please consider converting the list to a single numpy.ndarray with numpy.array() before converting to a tensor. (Triggered internally at ../torch/csrc/utils/tensor_new.cpp:230.)
tensor = torch.as_tensor(tensor, dtype=torch.float32, device=device)
/xxx/anaconda3/envs/devit/lib/python3.9/site-packages/torch/functional.py:504: UserWarning: torch.meshgrid: in an upcoming release, it will be required to pass the indexing argument. (Triggered internally at ../aten/src/ATen/native/TensorShape.cpp:3190.)
return _VF.meshgrid(tensors, **kwargs) # type: ignore[attr-defined]
/xxx/anaconda3/envs/devit/lib/python3.9/site-packages/torch/functional.py:504: UserWarning: torch.meshgrid: in an upcoming release, it will be required to pass the indexing argument. (Triggered internally at ../aten/src/ATen/native/TensorShape.cpp:3190.)
return _VF.meshgrid(tensors, **kwargs) # type: ignore[attr-defined]
/xxx/anaconda3/envs/devit/lib/python3.9/site-packages/torch/functional.py:504: UserWarning: torch.meshgrid: in an upcoming release, it will be required to pass the indexing argument. (Triggered internally at ../aten/src/ATen/native/TensorShape.cpp:3190.)
return _VF.meshgrid(tensors, **kwargs) # type: ignore[attr-defined]
/xxx/anaconda3/envs/devit/lib/python3.9/site-packages/torch/functional.py:504: UserWarning: torch.meshgrid: in an upcoming release, it will be required to pass the indexing argument. (Triggered internally at ../aten/src/ATen/native/TensorShape.cpp:3190.)
return _VF.meshgrid(tensors, **kwargs) # type: ignore[attr-defined]
/xxx/anaconda3/envs/devit/lib/python3.9/site-packages/torch/functional.py:504: UserWarning: torch.meshgrid: in an upcoming release, it will be required to pass the indexing argument. (Triggered internally at ../aten/src/ATen/native/TensorShape.cpp:3190.)
return _VF.meshgrid(tensors, **kwargs) # type: ignore[attr-defined]
/xxx/anaconda3/envs/devit/lib/python3.9/site-packages/torch/functional.py:504: UserWarning: torch.meshgrid: in an upcoming release, it will be required to pass the indexing argument. (Triggered internally at ../aten/src/ATen/native/TensorShape.cpp:3190.)
return _VF.meshgrid(tensors, **kwargs) # type: ignore[attr-defined]
/xxx/anaconda3/envs/devit/lib/python3.9/site-packages/torch/functional.py:504: UserWarning: torch.meshgrid: in an upcoming release, it will be required to pass the indexing argument. (Triggered internally at ../aten/src/ATen/native/TensorShape.cpp:3190.)
return _VF.meshgrid(tensors, **kwargs) # type: ignore[attr-defined]
/xxx/anaconda3/envs/devit/lib/python3.9/site-packages/torch/functional.py:504: UserWarning: torch.meshgrid: in an upcoming release, it will be required to pass the indexing argument. (Triggered internally at ../aten/src/ATen/native/TensorShape.cpp:3190.)
return _VF.meshgrid(tensors, **kwargs) # type: ignore[attr-defined]
Traceback (most recent call last):
File "/xxx/devit/tools/train_net.py", line 202, in
launch(

File "/xxx/devit/tools/../detectron2/engine/launch.py", line 67, in launch mp.spawn( File "/xxx/anaconda3/envs/devit/lib/python3.9/site-packages/torch/multiprocessing/spawn.py", line 240, in spawn return start_processes(fn, args, nprocs, join, daemon, start_method='spawn') File "/xxx/anaconda3/envs/devit/lib/python3.9/site-packages/torch/multiprocessing/spawn.py", line 198, in start_processes while not context.join(): File "/xxx/anaconda3/envs/devit/lib/python3.9/site-packages/torch/multiprocessing/spawn.py", line 160, in join raise ProcessRaisedException(msg, error_index, failed_process.pid) torch.multiprocessing.spawn.ProcessRaisedException:

-- Process 6 terminated with the following error: Traceback (most recent call last): File "/xxx/anaconda3/envs/devit/lib/python3.9/site-packages/torch/multiprocessing/spawn.py", line 69, in _wrap fn(i, *args) File "/xxx/devit/tools/../detectron2/engine/launch.py", line 125, in _distributed_worker main_func(*args) File "/xxx/devit/tools/train_net.py", line 178, in main res = Trainer.test(cfg, model) File "/xxx/devit/tools/../detectron2/engine/defaults.py", line 618, in test results_i = inference_on_dataset(model, data_loader, evaluator) File "/xxx/devit/tools/../detectron2/evaluation/evaluator.py", line 159, in inference_on_dataset outputs = model(inputs) File "/xxx/anaconda3/envs/devit/lib/python3.9/site-packages/torch/nn/modules/module.py", line 1194, in _call_impl return forward_call(*input, **kwargs) File "/xxx/devit/tools/../detectron2/modeling/meta_arch/devit.py", line 1309, in forward embedding = rp(embedding) File "/xxx/anaconda3/envs/devit/lib/python3.9/site-packages/torch/nn/modules/module.py", line 1194, in _call_impl return forward_call(*input, **kwargs) File "/xxx/anaconda3/envs/devit/lib/python3.9/site-packages/torch/nn/modules/container.py", line 204, in forward input = module(input) File "/xxx/anaconda3/envs/devit/lib/python3.9/site-packages/torch/nn/modules/module.py", line 1194, in _call_impl return forward_call(*input, **kwargs) File "/xxx/anaconda3/envs/devit/lib/python3.9/site-packages/torch/nn/modules/batchnorm.py", line 171, in forward return F.batch_norm( File "/xxx/anaconda3/envs/devit/lib/python3.9/site-packages/torch/nn/functional.py", line 2450, in batch_norm return torch.batch_norm( torch.cuda.OutOfMemoryError: CUDA out of memory. Tried to allocate 3.82 GiB (GPU 6; 23.69 GiB total capacity; 19.57 GiB already allocated; 2.44 GiB free; 19.84 GiB reserved in total by PyTorch) If reserved memory is >> allocated memory try setting max_split_size_mb to avoid fragmentation. See documentation for Memory Management and PYTORCH_CUDA_ALLOC_CONF

/xxx/anaconda3/envs/devit/lib/python3.9/multiprocessing/resource_tracker.py:216: UserWarning: resource_tracker: There appear to be 140 leaked semaphore objects to clean up at shutdown warnings.warn('resource_tracker: There appear to be %d '

study-hard-forever avatar Oct 31 '23 07:10 study-hard-forever

Hi, @study-hard-forever for the first problem, I would suggest reduce K from 10 to 5 or 3. It shouldn't hurt accuracy too much according to the ablation study but will significantly reduce memory requirements. The fact is that because of turning C-classification into C binary classification, the space complexity (on a single GPU) of RCNN branch becomes roughly $O(K^2)$.

For the second problem, I also had that a lot before. The solution could be the steps followed:

  1. Use torch 1.x (e.g. 1.13) instead of torch2
  2. Install xformers 0.0.18 from source (have to because it is not available from pypi). Then the memory consumption of ViT will be reduced a lot, and the warnings will be mostly gone.

mlzxy avatar Oct 31 '23 15:10 mlzxy

Note that when I use certain versions of xformers on some GPU, e.g., 2080 Ti I remember, it sometimes produces inaccurate inference results. Don't know why but I would recommend you install this version of xformers from source.

commit da278628761928581f52b81647d82236a88c3afb (HEAD, tag: v0.0.18)

mlzxy avatar Oct 31 '23 16:10 mlzxy

Thanks for your reply. I remember I also had some problems yesterday when installing the 0.0.18 version from source, I'll try it again later, I'll also turn the K down a bit to try, thank you.

study-hard-forever avatar Nov 01 '23 03:11 study-hard-forever

Note that when I use certain versions of xformers on some GPU, e.g., 2080 Ti I remember, it sometimes produces inaccurate inference results. Don't know why but I would recommend you install this version of xformers from source.

commit da278628761928581f52b81647d82236a88c3afb (HEAD, tag: v0.0.18)

Install xformers 0.0.18 from this source works on V 100, GOOD!

Anymake avatar Nov 22 '23 13:11 Anymake

Note that when I use certain versions of xformers on some GPU, e.g., 2080 Ti I remember, it sometimes produces inaccurate inference results. Don't know why but I would recommend you install this version of xformers from source.

commit da278628761928581f52b81647d82236a88c3afb (HEAD, tag: v0.0.18)

Install xformers 0.0.18 from this source works on V 100, GOOD!

Ok, thank you for your reply. If there is a chance in the future, I will try it on V100.

study-hard-forever avatar Nov 22 '23 13:11 study-hard-forever

Hi, @study-hard-forever for the first problem, I would suggest reduce K from 10 to 5 or 3. It shouldn't hurt accuracy too much according to the ablation study but will significantly reduce memory requirements. The fact is that because of turning C-classification into C binary classification, the space complexity (on a single GPU) of RCNN branch becomes roughly O(K2).

For the second problem, I also had that a lot before. The solution could be the steps followed:

  1. Use torch 1.x (e.g. 1.13) instead of torch2
  2. Install xformers 0.0.18 from source (have to because it is not available from pypi). Then the memory consumption of ViT will be reduced a lot, and the warnings will be mostly gone.

Just reduce TOPK from 10 to 3, but I don't know if this will reduce the accuracy of the model.

RuoyuChen10 avatar Jan 19 '24 05:01 RuoyuChen10