Memory overflow error in coco evaluation, but I used 8 3090 graphics cards and did not encounter it when evaluating the lvis dataset

Open study-hard-forever opened this issue 2 years ago • 6 comments

I encountered torch.cuda.OutOfMemoryError: CUDA out of memory error, but I used 8 3090 graphics cards and did not encounter it when evaluating the lvis dataset, I tried to reduce the batch_size to 8 and even replaced it with vit_b/s but still did not get it solved; in which The configuration file configs/open-vocabulary/coco/vits.yaml for vit_s has a file naming error, CLASS_PROTOTYPES: "weights/initial/open-vocabulary/prototypes/coco/class_ prototypes_all.vitl14.pth" should be CLASS_PROTOTYPES: "weights/initial/open-vocabulary/prototypes/coco/class_prototypes_all.vits14.pth".
The second problem is that I am encountering a lot of warning messages (although this may not affect normal operation), but I noticed the warning message "xFormers not available", which I don't really understand exactly what it means, so please can you give an explanation. Although I tried to install other versions of xFormers such as xformers==0.0.18 and xformers==0.0.20, they rely on torch2, which can conflict with the detectron2 framework, as described in the official detectron2 configuration: https://github.com/ facebookresearch/detectron2/releases This did not go so well as you mentioned.

Note: I followed the configuration exactly as given in your tutorial:

git clone https://github.com/mlzxy/devit.git conda create -n devit python=3.9 conda activate devit pip install -r devit/requirements.txt pip install -e ./devit

but still encountered the problem mentioned in the first question.

Looking forward to your reply, thanks!

command: vit=b task=ovd dataset=coco bash scripts/eval.sh

The logs are as follows:

task=ovd, vit=b, dataset=coco, shot=10, split=1, num_gpus=8 xFormers not available Command Line Args: Namespace(config_file='configs/open-vocabulary/coco/vitb.yaml', resume=False, eval_only=True, num_gpus=8, num_machines=1, machine_rank=0, dist_url='auto', opts=['MODEL.WEIGHTS', 'weights/trained/open-vocabulary/coco/vitb_0079999.pth', 'DE.OFFLINE_RPN_CONFIG', 'configs/RPN/mask_rcnn_R_50_C4_1x_ovd_FSD.yaml', 'OUTPUT_DIR', 'output/eval/open-vocabulary/coco/vitb/']) xFormers not available xFormers not available xFormers not available xFormers not available xFormers not available xFormers not available xFormers not available xFormers not available [10/31 14:52:24 detectron2]: Rank of current process: 0. World size: 8 [10/31 14:52:25 detectron2]: Environment info:

sys.platform linux Python 3.9.18 (main, Sep 11 2023, 13:41:44) [GCC 11.2.0] numpy 1.22.4 detectron2 RegionCLIP @/xxx/devit/tools/../detectron2 Compiler GCC 9.4 CUDA compiler CUDA 11.8 detectron2 arch flags 8.6 DETECTRON2_ENV_MODULE PyTorch 1.13.1+cu117 @/xxx/anaconda3/envs/devit/lib/python3.9/site-packages/torch PyTorch debug build False GPU available True GPU 0,1,2,3,4,5,6,7 NVIDIA GeForce RTX 3090 (arch=8.6) CUDA_HOME /usr/local/cuda-11.8 Pillow 9.5.0 torchvision 0.14.1+cu117 @/xxx/anaconda3/envs/devit/lib/python3.9/site-packages/torchvision torchvision arch flags 3.5, 5.0, 6.0, 7.0, 7.5, 8.0, 8.6 fvcore 0.1.5.post20221221 iopath 0.1.8 cv2 4.8.1

PyTorch built with:

GCC 9.3
C++ Version: 201402
Intel(R) Math Kernel Library Version 2020.0.0 Product Build 20191122 for Intel(R) 64 architecture applications
Intel(R) MKL-DNN v2.6.0 (Git Hash 52b5f107dd9cf10910aaa19cb47f3abf9b349815)
OpenMP 201511 (a.k.a. OpenMP 4.5)
LAPACK is enabled (usually provided by MKL)
NNPACK is enabled
CPU capability usage: AVX2
CUDA Runtime 11.7
NVCC architecture flags: -gencode;arch=compute_37,code=sm_37;-gencode;arch=compute_50,code=sm_50;-gencode;arch=compute_60,code=sm_60;-gencode;arch=compute_70,code=sm_70;-gencode;arch=compute_75,code=sm_75;-gencode;arch=compute_80,code=sm_80;-gencode;arch=compute_86,code=sm_86
CuDNN 8.5
Magma 2.6.1
Build settings: BLAS_INFO=mkl, BUILD_TYPE=Release, CUDA_VERSION=11.7, CUDNN_VERSION=8.5.0, CXX_COMPILER=/opt/rh/devtoolset-9/root/usr/bin/c++, CXX_FLAGS= -fabi-version=11 -Wno-deprecated -fvisibility-inlines-hidden -DUSE_PTHREADPOOL -fopenmp -DNDEBUG -DUSE_KINETO -DUSE_FBGEMM -DUSE_QNNPACK -DUSE_PYTORCH_QNNPACK -DUSE_XNNPACK -DSYMBOLICATE_MOBILE_DEBUG_HANDLE -DEDGE_PROFILER_USE_KINETO -O2 -fPIC -Wno-narrowing -Wall -Wextra -Werror=return-type -Werror=non-virtual-dtor -Wno-missing-field-initializers -Wno-type-limits -Wno-array-bounds -Wno-unknown-pragmas -Wunused-local-typedefs -Wno-unused-parameter -Wno-unused-function -Wno-unused-result -Wno-strict-overflow -Wno-strict-aliasing -Wno-error=deprecated-declarations -Wno-stringop-overflow -Wno-psabi -Wno-error=pedantic -Wno-error=redundant-decls -Wno-error=old-style-cast -fdiagnostics-color=always -faligned-new -Wno-unused-but-set-variable -Wno-maybe-uninitialized -fno-math-errno -fno-trapping-math -Werror=format -Werror=cast-function-type -Wno-stringop-overflow, LAPACK_INFO=mkl, PERF_WITH_AVX=1, PERF_WITH_AVX2=1, PERF_WITH_AVX512=1, TORCH_VERSION=1.13.1, USE_CUDA=ON, USE_CUDNN=ON, USE_EXCEPTION_PTR=1, USE_GFLAGS=OFF, USE_GLOG=OFF, USE_MKL=ON, USE_MKLDNN=ON, USE_MPI=OFF, USE_NCCL=ON, USE_NNPACK=ON, USE_OPENMP=ON, USE_ROCM=OFF,

[10/31 14:52:25 detectron2]: Command line arguments: Namespace(config_file='configs/open-vocabulary/coco/vitb.yaml', resume=False, eval_only=True, num_gpus=8, num_machines=1, machine_rank=0, dist_url='auto', opts=['MODEL.WEIGHTS', 'weights/trained/open-vocabulary/coco/vitb_0079999.pth', 'DE.OFFLINE_RPN_CONFIG', 'configs/RPN/mask_rcnn_R_50_C4_1x_ovd_FSD.yaml', 'OUTPUT_DIR', 'output/eval/open-vocabulary/coco/vitb/']) [10/31 14:52:25 detectron2]: Contents of args.config_file=configs/open-vocabulary/coco/vitb.yaml: BASE: "../../Base-RCNN-C4.yaml" DE: CLASS_PROTOTYPES: "weights/initial/open-vocabulary/prototypes/coco/class_prototypes_base.vitb14.pth,weights/initial/open-vocabulary/prototypes/coco/class_prototypes_novel.vitb14.pth" BG_PROTOTYPES: "weights/initial/background/background_prototypes.vitb14.pth" BG_CLS_LOSS_WEIGHT: 0.2 TOPK: 10

MODEL: META_ARCHITECTURE: "OpenSetDetectorWithExamples" BACKBONE: NAME: "build_dino_v2_vit" TYPE: "base" WEIGHTS: "" MASK_ON: False RPN: HEAD_NAME: StandardRPNHead IN_FEATURES: ["res4"] ROI_HEADS: SCORE_THRESH_TEST: 0.001 ROI_BOX_HEAD: NAME: "" NUM_FC: 0 POOLER_RESOLUTION: 7 CLS_AGNOSTIC_BBOX_REG: True PIXEL_MEAN: [0.48145466, 0.4578275, 0.40821073] PIXEL_STD: [0.26862954, 0.26130258, 0.27577711] INPUT: MIN_SIZE_TRAIN: (640, 672, 704, 736, 768, 800) DATASETS: TRAIN: ("coco_2017_ovd_b_train",) TEST: ("coco_2017_ovd_all_test",) TEST: EVAL_PERIOD: 5000 SOLVER: IMS_PER_BATCH: 8 BASE_LR: 0.002 STEPS: (60000, 80000) MAX_ITER: 90000 WARMUP_ITERS: 5000 CHECKPOINT_PERIOD: 5000

INPUT: MIN_SIZE_TRAIN_SAMPLING: choice MIN_SIZE_TRAIN: (640, 672, 704, 736, 768, 800) MAX_SIZE_TRAIN: 1333 MIN_SIZE_TEST: 800 MAX_SIZE_TEST: 1333 FORMAT: "RGB"

[10/31 14:52:25 detectron2]: Full config saved to output/eval/open-vocabulary/coco/vitb/config.yaml [10/31 14:52:25 d2.utils.env]: Using a generated random seed 26738411 ('coco_2017_ovd_all_test',) [10/31 14:52:33 fvcore.common.checkpoint]: [Checkpointer] Loading from weights/trained/open-vocabulary/coco/vitb_0079999.pth ... [10/31 14:52:34 d2.data.datasets.coco]: Loaded 4836 images in COCO format from datasets/coco/annotations/ovd_ins_val2017_all.json [10/31 14:52:34 d2.data.build]: Distribution of instances among all 65 categories:

category	#instances	category	#instances	category	#instances
person	10777	bicycle	314	car	1918
motorcycle	367	airplane	143	bus	283
train	190	truck	414	boat	424
bench	411	bird	427	cat	202
dog	218	horse	272	sheep	354
cow	372	elephant	252	bear	71
zebra	266	giraffe	232	backpack	371
umbrella	407	handbag	540	tie	252
suitcase	299	frisbee	115	skis	241
snowboard	69	kite	327	skateboard	179
surfboard	267	bottle	1013	cup	895
fork	215	knife	325	spoon	253
bowl	623	banana	370	apple	236
sandwich	177	orange	285	broccoli	312
carrot	365	pizza	284	donut	328
cake	310	chair	1771	couch	261
bed	163	toilet	179	tv	288
laptop	231	mouse	106	remote	283
keyboard	153	microwave	55	oven	143
toaster	9	sink	225	refrigerator	126
book	1129	clock	267	vase	274
scissors	36	toothbrush	57
total	32721
[10/31 14:52:34 d2.data.dataset_mapper]: [DatasetMapper] Augmentations used in inference: [ResizeShortestEdge(short_edge_length=(800, 800), max_size=1333, sample_style='choice')]
[10/31 14:52:34 d2.data.common]: Serializing 4836 elements to byte tensors and concatenating them all ...
[10/31 14:52:34 d2.data.common]: Serialized dataset takes 17.62 MiB
('coco_2017_ovd_all_test',)
('coco_2017_ovd_all_test',)
('coco_2017_ovd_all_test',)
[10/31 14:52:35 d2.evaluation.evaluator]: Start inference on 605 batches
('coco_2017_ovd_all_test',)
('coco_2017_ovd_all_test',)
('coco_2017_ovd_all_test',)
('coco_2017_ovd_all_test',)
xFormers not available
xFormers not available
xFormers not available
xFormers not available
xFormers not available
xFormers not available
xFormers not available
xFormers not available
xFormers not available
xFormers not available
xFormers not available
xFormers not available
xFormers not available
xFormers not available
xFormers not available
xFormers not available
xFormers not available
xFormers not available
xFormers not available
xFormers not available
xFormers not available
xFormers not available
xFormers not available
xFormers not available
xFormers not available
xFormers not available
xFormers not available
xFormers not available
xFormers not available
xFormers not available
xFormers not available
xFormers not available
/xxx/devit/tools/../detectron2/structures/boxes.py:158: UserWarning: Creating a tensor from a list of numpy.ndarrays is extremely slow. Please consider converting the list to a single numpy.ndarray with numpy.array() before converting to a tensor. (Triggered internally at ../torch/csrc/utils/tensor_new.cpp:230.)
tensor = torch.as_tensor(tensor, dtype=torch.float32, device=device)
/xxx/devit/tools/../detectron2/structures/boxes.py:158: UserWarning: Creating a tensor from a list of numpy.ndarrays is extremely slow. Please consider converting the list to a single numpy.ndarray with numpy.array() before converting to a tensor. (Triggered internally at ../torch/csrc/utils/tensor_new.cpp:230.)
tensor = torch.as_tensor(tensor, dtype=torch.float32, device=device)
/xxx/devit/tools/../detectron2/structures/boxes.py:158: UserWarning: Creating a tensor from a list of numpy.ndarrays is extremely slow. Please consider converting the list to a single numpy.ndarray with numpy.array() before converting to a tensor. (Triggered internally at ../torch/csrc/utils/tensor_new.cpp:230.)
tensor = torch.as_tensor(tensor, dtype=torch.float32, device=device)
/xxx/devit/tools/../detectron2/structures/boxes.py:158: UserWarning: Creating a tensor from a list of numpy.ndarrays is extremely slow. Please consider converting the list to a single numpy.ndarray with numpy.array() before converting to a tensor. (Triggered internally at ../torch/csrc/utils/tensor_new.cpp:230.)
tensor = torch.as_tensor(tensor, dtype=torch.float32, device=device)
/xxx/devit/tools/../detectron2/structures/boxes.py:158: UserWarning: Creating a tensor from a list of numpy.ndarrays is extremely slow. Please consider converting the list to a single numpy.ndarray with numpy.array() before converting to a tensor. (Triggered internally at ../torch/csrc/utils/tensor_new.cpp:230.)
tensor = torch.as_tensor(tensor, dtype=torch.float32, device=device)
/xxx/devit/tools/../detectron2/structures/boxes.py:158: UserWarning: Creating a tensor from a list of numpy.ndarrays is extremely slow. Please consider converting the list to a single numpy.ndarray with numpy.array() before converting to a tensor. (Triggered internally at ../torch/csrc/utils/tensor_new.cpp:230.)
tensor = torch.as_tensor(tensor, dtype=torch.float32, device=device)
/xxx/devit/tools/../detectron2/structures/boxes.py:158: UserWarning: Creating a tensor from a list of numpy.ndarrays is extremely slow. Please consider converting the list to a single numpy.ndarray with numpy.array() before converting to a tensor. (Triggered internally at ../torch/csrc/utils/tensor_new.cpp:230.)
tensor = torch.as_tensor(tensor, dtype=torch.float32, device=device)
/xxx/devit/tools/../detectron2/structures/boxes.py:158: UserWarning: Creating a tensor from a list of numpy.ndarrays is extremely slow. Please consider converting the list to a single numpy.ndarray with numpy.array() before converting to a tensor. (Triggered internally at ../torch/csrc/utils/tensor_new.cpp:230.)
tensor = torch.as_tensor(tensor, dtype=torch.float32, device=device)
/xxx/devit/tools/../detectron2/structures/boxes.py:158: UserWarning: Creating a tensor from a list of numpy.ndarrays is extremely slow. Please consider converting the list to a single numpy.ndarray with numpy.array() before converting to a tensor. (Triggered internally at ../torch/csrc/utils/tensor_new.cpp:230.)
tensor = torch.as_tensor(tensor, dtype=torch.float32, device=device)
/xxx/devit/tools/../detectron2/structures/boxes.py:158: UserWarning: Creating a tensor from a list of numpy.ndarrays is extremely slow. Please consider converting the list to a single numpy.ndarray with numpy.array() before converting to a tensor. (Triggered internally at ../torch/csrc/utils/tensor_new.cpp:230.)
tensor = torch.as_tensor(tensor, dtype=torch.float32, device=device)
/xxx/devit/tools/../detectron2/structures/boxes.py:158: UserWarning: Creating a tensor from a list of numpy.ndarrays is extremely slow. Please consider converting the list to a single numpy.ndarray with numpy.array() before converting to a tensor. (Triggered internally at ../torch/csrc/utils/tensor_new.cpp:230.)
tensor = torch.as_tensor(tensor, dtype=torch.float32, device=device)
/xxx/devit/tools/../detectron2/structures/boxes.py:158: UserWarning: Creating a tensor from a list of numpy.ndarrays is extremely slow. Please consider converting the list to a single numpy.ndarray with numpy.array() before converting to a tensor. (Triggered internally at ../torch/csrc/utils/tensor_new.cpp:230.)
tensor = torch.as_tensor(tensor, dtype=torch.float32, device=device)
/xxx/devit/tools/../detectron2/structures/boxes.py:158: UserWarning: Creating a tensor from a list of numpy.ndarrays is extremely slow. Please consider converting the list to a single numpy.ndarray with numpy.array() before converting to a tensor. (Triggered internally at ../torch/csrc/utils/tensor_new.cpp:230.)
tensor = torch.as_tensor(tensor, dtype=torch.float32, device=device)
/xxx/devit/tools/../detectron2/structures/boxes.py:158: UserWarning: Creating a tensor from a list of numpy.ndarrays is extremely slow. Please consider converting the list to a single numpy.ndarray with numpy.array() before converting to a tensor. (Triggered internally at ../torch/csrc/utils/tensor_new.cpp:230.)
tensor = torch.as_tensor(tensor, dtype=torch.float32, device=device)
/xxx/devit/tools/../detectron2/structures/boxes.py:158: UserWarning: Creating a tensor from a list of numpy.ndarrays is extremely slow. Please consider converting the list to a single numpy.ndarray with numpy.array() before converting to a tensor. (Triggered internally at ../torch/csrc/utils/tensor_new.cpp:230.)
tensor = torch.as_tensor(tensor, dtype=torch.float32, device=device)
/xxx/devit/tools/../detectron2/structures/boxes.py:158: UserWarning: Creating a tensor from a list of numpy.ndarrays is extremely slow. Please consider converting the list to a single numpy.ndarray with numpy.array() before converting to a tensor. (Triggered internally at ../torch/csrc/utils/tensor_new.cpp:230.)
tensor = torch.as_tensor(tensor, dtype=torch.float32, device=device)
/xxx/devit/tools/../detectron2/structures/boxes.py:158: UserWarning: Creating a tensor from a list of numpy.ndarrays is extremely slow. Please consider converting the list to a single numpy.ndarray with numpy.array() before converting to a tensor. (Triggered internally at ../torch/csrc/utils/tensor_new.cpp:230.)
tensor = torch.as_tensor(tensor, dtype=torch.float32, device=device)
/xxx/devit/tools/../detectron2/structures/boxes.py:158: UserWarning: Creating a tensor from a list of numpy.ndarrays is extremely slow. Please consider converting the list to a single numpy.ndarray with numpy.array() before converting to a tensor. (Triggered internally at ../torch/csrc/utils/tensor_new.cpp:230.)
tensor = torch.as_tensor(tensor, dtype=torch.float32, device=device)
/xxx/devit/tools/../detectron2/structures/boxes.py:158: UserWarning: Creating a tensor from a list of numpy.ndarrays is extremely slow. Please consider converting the list to a single numpy.ndarray with numpy.array() before converting to a tensor. (Triggered internally at ../torch/csrc/utils/tensor_new.cpp:230.)
tensor = torch.as_tensor(tensor, dtype=torch.float32, device=device)
/xxx/devit/tools/../detectron2/structures/boxes.py:158: UserWarning: Creating a tensor from a list of numpy.ndarrays is extremely slow. Please consider converting the list to a single numpy.ndarray with numpy.array() before converting to a tensor. (Triggered internally at ../torch/csrc/utils/tensor_new.cpp:230.)
tensor = torch.as_tensor(tensor, dtype=torch.float32, device=device)
/xxx/devit/tools/../detectron2/structures/boxes.py:158: UserWarning: Creating a tensor from a list of numpy.ndarrays is extremely slow. Please consider converting the list to a single numpy.ndarray with numpy.array() before converting to a tensor. (Triggered internally at ../torch/csrc/utils/tensor_new.cpp:230.)
tensor = torch.as_tensor(tensor, dtype=torch.float32, device=device)
/xxx/devit/tools/../detectron2/structures/boxes.py:158: UserWarning: Creating a tensor from a list of numpy.ndarrays is extremely slow. Please consider converting the list to a single numpy.ndarray with numpy.array() before converting to a tensor. (Triggered internally at ../torch/csrc/utils/tensor_new.cpp:230.)
tensor = torch.as_tensor(tensor, dtype=torch.float32, device=device)
/xxx/devit/tools/../detectron2/structures/boxes.py:158: UserWarning: Creating a tensor from a list of numpy.ndarrays is extremely slow. Please consider converting the list to a single numpy.ndarray with numpy.array() before converting to a tensor. (Triggered internally at ../torch/csrc/utils/tensor_new.cpp:230.)
tensor = torch.as_tensor(tensor, dtype=torch.float32, device=device)
/xxx/devit/tools/../detectron2/structures/boxes.py:158: UserWarning: Creating a tensor from a list of numpy.ndarrays is extremely slow. Please consider converting the list to a single numpy.ndarray with numpy.array() before converting to a tensor. (Triggered internally at ../torch/csrc/utils/tensor_new.cpp:230.)
tensor = torch.as_tensor(tensor, dtype=torch.float32, device=device)
/xxx/devit/tools/../detectron2/structures/boxes.py:158: UserWarning: Creating a tensor from a list of numpy.ndarrays is extremely slow. Please consider converting the list to a single numpy.ndarray with numpy.array() before converting to a tensor. (Triggered internally at ../torch/csrc/utils/tensor_new.cpp:230.)
tensor = torch.as_tensor(tensor, dtype=torch.float32, device=device)
/xxx/devit/tools/../detectron2/structures/boxes.py:158: UserWarning: Creating a tensor from a list of numpy.ndarrays is extremely slow. Please consider converting the list to a single numpy.ndarray with numpy.array() before converting to a tensor. (Triggered internally at ../torch/csrc/utils/tensor_new.cpp:230.)
tensor = torch.as_tensor(tensor, dtype=torch.float32, device=device)
/xxx/devit/tools/../detectron2/structures/boxes.py:158: UserWarning: Creating a tensor from a list of numpy.ndarrays is extremely slow. Please consider converting the list to a single numpy.ndarray with numpy.array() before converting to a tensor. (Triggered internally at ../torch/csrc/utils/tensor_new.cpp:230.)
tensor = torch.as_tensor(tensor, dtype=torch.float32, device=device)
/xxx/devit/tools/../detectron2/structures/boxes.py:158: UserWarning: Creating a tensor from a list of numpy.ndarrays is extremely slow. Please consider converting the list to a single numpy.ndarray with numpy.array() before converting to a tensor. (Triggered internally at ../torch/csrc/utils/tensor_new.cpp:230.)
tensor = torch.as_tensor(tensor, dtype=torch.float32, device=device)
/xxx/devit/tools/../detectron2/structures/boxes.py:158: UserWarning: Creating a tensor from a list of numpy.ndarrays is extremely slow. Please consider converting the list to a single numpy.ndarray with numpy.array() before converting to a tensor. (Triggered internally at ../torch/csrc/utils/tensor_new.cpp:230.)
tensor = torch.as_tensor(tensor, dtype=torch.float32, device=device)
/xxx/devit/tools/../detectron2/structures/boxes.py:158: UserWarning: Creating a tensor from a list of numpy.ndarrays is extremely slow. Please consider converting the list to a single numpy.ndarray with numpy.array() before converting to a tensor. (Triggered internally at ../torch/csrc/utils/tensor_new.cpp:230.)
tensor = torch.as_tensor(tensor, dtype=torch.float32, device=device)
/xxx/devit/tools/../detectron2/structures/boxes.py:158: UserWarning: Creating a tensor from a list of numpy.ndarrays is extremely slow. Please consider converting the list to a single numpy.ndarray with numpy.array() before converting to a tensor. (Triggered internally at ../torch/csrc/utils/tensor_new.cpp:230.)
tensor = torch.as_tensor(tensor, dtype=torch.float32, device=device)
/xxx/devit/tools/../detectron2/structures/boxes.py:158: UserWarning: Creating a tensor from a list of numpy.ndarrays is extremely slow. Please consider converting the list to a single numpy.ndarray with numpy.array() before converting to a tensor. (Triggered internally at ../torch/csrc/utils/tensor_new.cpp:230.)
tensor = torch.as_tensor(tensor, dtype=torch.float32, device=device)
/xxx/anaconda3/envs/devit/lib/python3.9/site-packages/torch/functional.py:504: UserWarning: torch.meshgrid: in an upcoming release, it will be required to pass the indexing argument. (Triggered internally at ../aten/src/ATen/native/TensorShape.cpp:3190.)
return _VF.meshgrid(tensors, **kwargs) # type: ignore[attr-defined]
/xxx/anaconda3/envs/devit/lib/python3.9/site-packages/torch/functional.py:504: UserWarning: torch.meshgrid: in an upcoming release, it will be required to pass the indexing argument. (Triggered internally at ../aten/src/ATen/native/TensorShape.cpp:3190.)
return _VF.meshgrid(tensors, **kwargs) # type: ignore[attr-defined]
/xxx/anaconda3/envs/devit/lib/python3.9/site-packages/torch/functional.py:504: UserWarning: torch.meshgrid: in an upcoming release, it will be required to pass the indexing argument. (Triggered internally at ../aten/src/ATen/native/TensorShape.cpp:3190.)
return _VF.meshgrid(tensors, **kwargs) # type: ignore[attr-defined]
/xxx/anaconda3/envs/devit/lib/python3.9/site-packages/torch/functional.py:504: UserWarning: torch.meshgrid: in an upcoming release, it will be required to pass the indexing argument. (Triggered internally at ../aten/src/ATen/native/TensorShape.cpp:3190.)
return _VF.meshgrid(tensors, **kwargs) # type: ignore[attr-defined]
/xxx/anaconda3/envs/devit/lib/python3.9/site-packages/torch/functional.py:504: UserWarning: torch.meshgrid: in an upcoming release, it will be required to pass the indexing argument. (Triggered internally at ../aten/src/ATen/native/TensorShape.cpp:3190.)
return _VF.meshgrid(tensors, **kwargs) # type: ignore[attr-defined]
/xxx/anaconda3/envs/devit/lib/python3.9/site-packages/torch/functional.py:504: UserWarning: torch.meshgrid: in an upcoming release, it will be required to pass the indexing argument. (Triggered internally at ../aten/src/ATen/native/TensorShape.cpp:3190.)
return _VF.meshgrid(tensors, **kwargs) # type: ignore[attr-defined]
/xxx/anaconda3/envs/devit/lib/python3.9/site-packages/torch/functional.py:504: UserWarning: torch.meshgrid: in an upcoming release, it will be required to pass the indexing argument. (Triggered internally at ../aten/src/ATen/native/TensorShape.cpp:3190.)
return _VF.meshgrid(tensors, **kwargs) # type: ignore[attr-defined]
/xxx/anaconda3/envs/devit/lib/python3.9/site-packages/torch/functional.py:504: UserWarning: torch.meshgrid: in an upcoming release, it will be required to pass the indexing argument. (Triggered internally at ../aten/src/ATen/native/TensorShape.cpp:3190.)
return _VF.meshgrid(tensors, **kwargs) # type: ignore[attr-defined]
Traceback (most recent call last):
File "/xxx/devit/tools/train_net.py", line 202, in

launch(

File "/xxx/devit/tools/../detectron2/engine/launch.py", line 67, in launch mp.spawn( File "/xxx/anaconda3/envs/devit/lib/python3.9/site-packages/torch/multiprocessing/spawn.py", line 240, in spawn return start_processes(fn, args, nprocs, join, daemon, start_method='spawn') File "/xxx/anaconda3/envs/devit/lib/python3.9/site-packages/torch/multiprocessing/spawn.py", line 198, in start_processes while not context.join(): File "/xxx/anaconda3/envs/devit/lib/python3.9/site-packages/torch/multiprocessing/spawn.py", line 160, in join raise ProcessRaisedException(msg, error_index, failed_process.pid) torch.multiprocessing.spawn.ProcessRaisedException:

-- Process 6 terminated with the following error: Traceback (most recent call last): File "/xxx/anaconda3/envs/devit/lib/python3.9/site-packages/torch/multiprocessing/spawn.py", line 69, in _wrap fn(i, *args) File "/xxx/devit/tools/../detectron2/engine/launch.py", line 125, in _distributed_worker main_func(*args) File "/xxx/devit/tools/train_net.py", line 178, in main res = Trainer.test(cfg, model) File "/xxx/devit/tools/../detectron2/engine/defaults.py", line 618, in test results_i = inference_on_dataset(model, data_loader, evaluator) File "/xxx/devit/tools/../detectron2/evaluation/evaluator.py", line 159, in inference_on_dataset outputs = model(inputs) File "/xxx/anaconda3/envs/devit/lib/python3.9/site-packages/torch/nn/modules/module.py", line 1194, in _call_impl return forward_call(*input, **kwargs) File "/xxx/devit/tools/../detectron2/modeling/meta_arch/devit.py", line 1309, in forward embedding = rp(embedding) File "/xxx/anaconda3/envs/devit/lib/python3.9/site-packages/torch/nn/modules/module.py", line 1194, in _call_impl return forward_call(*input, **kwargs) File "/xxx/anaconda3/envs/devit/lib/python3.9/site-packages/torch/nn/modules/container.py", line 204, in forward input = module(input) File "/xxx/anaconda3/envs/devit/lib/python3.9/site-packages/torch/nn/modules/module.py", line 1194, in _call_impl return forward_call(*input, **kwargs) File "/xxx/anaconda3/envs/devit/lib/python3.9/site-packages/torch/nn/modules/batchnorm.py", line 171, in forward return F.batch_norm( File "/xxx/anaconda3/envs/devit/lib/python3.9/site-packages/torch/nn/functional.py", line 2450, in batch_norm return torch.batch_norm( torch.cuda.OutOfMemoryError: CUDA out of memory. Tried to allocate 3.82 GiB (GPU 6; 23.69 GiB total capacity; 19.57 GiB already allocated; 2.44 GiB free; 19.84 GiB reserved in total by PyTorch) If reserved memory is >> allocated memory try setting max_split_size_mb to avoid fragmentation. See documentation for Memory Management and PYTORCH_CUDA_ALLOC_CONF

/xxx/anaconda3/envs/devit/lib/python3.9/multiprocessing/resource_tracker.py:216: UserWarning: resource_tracker: There appear to be 140 leaked semaphore objects to clean up at shutdown warnings.warn('resource_tracker: There appear to be %d '

Oct 31 '23 07:10 study-hard-forever

Hi, @study-hard-forever for the first problem, I would suggest reduce K from 10 to 5 or 3. It shouldn't hurt accuracy too much according to the ablation study but will significantly reduce memory requirements. The fact is that because of turning C-classification into C binary classification, the space complexity (on a single GPU) of RCNN branch becomes roughly $O(K^2)$.

For the second problem, I also had that a lot before. The solution could be the steps followed:

Use torch 1.x (e.g. 1.13) instead of torch2
Install xformers 0.0.18 from source (have to because it is not available from pypi). Then the memory consumption of ViT will be reduced a lot, and the warnings will be mostly gone.

Oct 31 '23 15:10 mlzxy

Note that when I use certain versions of xformers on some GPU, e.g., 2080 Ti I remember, it sometimes produces inaccurate inference results. Don't know why but I would recommend you install this version of xformers from source.

commit da278628761928581f52b81647d82236a88c3afb (HEAD, tag: v0.0.18)

Oct 31 '23 16:10 mlzxy

Thanks for your reply. I remember I also had some problems yesterday when installing the 0.0.18 version from source, I'll try it again later, I'll also turn the K down a bit to try, thank you.

Nov 01 '23 03:11 study-hard-forever

Note that when I use certain versions of xformers on some GPU, e.g., 2080 Ti I remember, it sometimes produces inaccurate inference results. Don't know why but I would recommend you install this version of xformers from source.
commit da278628761928581f52b81647d82236a88c3afb (HEAD, tag: v0.0.18)

Install xformers 0.0.18 from this source works on V 100, GOOD!

Nov 22 '23 13:11 Anymake

Note that when I use certain versions of xformers on some GPU, e.g., 2080 Ti I remember, it sometimes produces inaccurate inference results. Don't know why but I would recommend you install this version of xformers from source.
commit da278628761928581f52b81647d82236a88c3afb (HEAD, tag: v0.0.18)
Install xformers 0.0.18 from this source works on V 100, GOOD!

Ok, thank you for your reply. If there is a chance in the future, I will try it on V100.

Nov 22 '23 13:11 study-hard-forever

Hi, @study-hard-forever for the first problem, I would suggest reduce K from 10 to 5 or 3. It shouldn't hurt accuracy too much according to the ablation study but will significantly reduce memory requirements. The fact is that because of turning C-classification into C binary classification, the space complexity (on a single GPU) of RCNN branch becomes roughly O(K2).

For the second problem, I also had that a lot before. The solution could be the steps followed:

Use torch 1.x (e.g. 1.13) instead of torch2

Install xformers 0.0.18 from source (have to because it is not available from pypi). Then the memory consumption of ViT will be reduced a lot, and the warnings will be mostly gone.

Just reduce TOPK from 10 to 3, but I don't know if this will reduce the accuracy of the model.

Jan 19 '24 05:01 RuoyuChen10