Memory overflow error in coco evaluation, but I used 8 3090 graphics cards and did not encounter it when evaluating the lvis dataset
- I encountered torch.cuda.OutOfMemoryError: CUDA out of memory error, but I used 8 3090 graphics cards and did not encounter it when evaluating the lvis dataset, I tried to reduce the batch_size to 8 and even replaced it with vit_b/s but still did not get it solved; in which The configuration file configs/open-vocabulary/coco/vits.yaml for vit_s has a file naming error, CLASS_PROTOTYPES: "weights/initial/open-vocabulary/prototypes/coco/class_ prototypes_all.vitl14.pth" should be CLASS_PROTOTYPES: "weights/initial/open-vocabulary/prototypes/coco/class_prototypes_all.vits14.pth".
- The second problem is that I am encountering a lot of warning messages (although this may not affect normal operation), but I noticed the warning message "xFormers not available", which I don't really understand exactly what it means, so please can you give an explanation. Although I tried to install other versions of xFormers such as xformers==0.0.18 and xformers==0.0.20, they rely on torch2, which can conflict with the detectron2 framework, as described in the official detectron2 configuration: https://github.com/ facebookresearch/detectron2/releases This did not go so well as you mentioned.
Note: I followed the configuration exactly as given in your tutorial:
git clone https://github.com/mlzxy/devit.git conda create -n devit python=3.9 conda activate devit pip install -r devit/requirements.txt pip install -e ./devit
but still encountered the problem mentioned in the first question.
Looking forward to your reply, thanks!
command: vit=b task=ovd dataset=coco bash scripts/eval.sh
The logs are as follows:
task=ovd, vit=b, dataset=coco, shot=10, split=1, num_gpus=8 xFormers not available Command Line Args: Namespace(config_file='configs/open-vocabulary/coco/vitb.yaml', resume=False, eval_only=True, num_gpus=8, num_machines=1, machine_rank=0, dist_url='auto', opts=['MODEL.WEIGHTS', 'weights/trained/open-vocabulary/coco/vitb_0079999.pth', 'DE.OFFLINE_RPN_CONFIG', 'configs/RPN/mask_rcnn_R_50_C4_1x_ovd_FSD.yaml', 'OUTPUT_DIR', 'output/eval/open-vocabulary/coco/vitb/']) xFormers not available xFormers not available xFormers not available xFormers not available xFormers not available xFormers not available xFormers not available xFormers not available [10/31 14:52:24 detectron2]: Rank of current process: 0. World size: 8 [10/31 14:52:25 detectron2]: Environment info:
sys.platform linux
Python 3.9.18 (main, Sep 11 2023, 13:41:44) [GCC 11.2.0]
numpy 1.22.4
detectron2 RegionCLIP @/xxx/devit/tools/../detectron2
Compiler GCC 9.4
CUDA compiler CUDA 11.8
detectron2 arch flags 8.6
DETECTRON2_ENV_MODULE
PyTorch built with:
- GCC 9.3
- C++ Version: 201402
- Intel(R) Math Kernel Library Version 2020.0.0 Product Build 20191122 for Intel(R) 64 architecture applications
- Intel(R) MKL-DNN v2.6.0 (Git Hash 52b5f107dd9cf10910aaa19cb47f3abf9b349815)
- OpenMP 201511 (a.k.a. OpenMP 4.5)
- LAPACK is enabled (usually provided by MKL)
- NNPACK is enabled
- CPU capability usage: AVX2
- CUDA Runtime 11.7
- NVCC architecture flags: -gencode;arch=compute_37,code=sm_37;-gencode;arch=compute_50,code=sm_50;-gencode;arch=compute_60,code=sm_60;-gencode;arch=compute_70,code=sm_70;-gencode;arch=compute_75,code=sm_75;-gencode;arch=compute_80,code=sm_80;-gencode;arch=compute_86,code=sm_86
- CuDNN 8.5
- Magma 2.6.1
- Build settings: BLAS_INFO=mkl, BUILD_TYPE=Release, CUDA_VERSION=11.7, CUDNN_VERSION=8.5.0, CXX_COMPILER=/opt/rh/devtoolset-9/root/usr/bin/c++, CXX_FLAGS= -fabi-version=11 -Wno-deprecated -fvisibility-inlines-hidden -DUSE_PTHREADPOOL -fopenmp -DNDEBUG -DUSE_KINETO -DUSE_FBGEMM -DUSE_QNNPACK -DUSE_PYTORCH_QNNPACK -DUSE_XNNPACK -DSYMBOLICATE_MOBILE_DEBUG_HANDLE -DEDGE_PROFILER_USE_KINETO -O2 -fPIC -Wno-narrowing -Wall -Wextra -Werror=return-type -Werror=non-virtual-dtor -Wno-missing-field-initializers -Wno-type-limits -Wno-array-bounds -Wno-unknown-pragmas -Wunused-local-typedefs -Wno-unused-parameter -Wno-unused-function -Wno-unused-result -Wno-strict-overflow -Wno-strict-aliasing -Wno-error=deprecated-declarations -Wno-stringop-overflow -Wno-psabi -Wno-error=pedantic -Wno-error=redundant-decls -Wno-error=old-style-cast -fdiagnostics-color=always -faligned-new -Wno-unused-but-set-variable -Wno-maybe-uninitialized -fno-math-errno -fno-trapping-math -Werror=format -Werror=cast-function-type -Wno-stringop-overflow, LAPACK_INFO=mkl, PERF_WITH_AVX=1, PERF_WITH_AVX2=1, PERF_WITH_AVX512=1, TORCH_VERSION=1.13.1, USE_CUDA=ON, USE_CUDNN=ON, USE_EXCEPTION_PTR=1, USE_GFLAGS=OFF, USE_GLOG=OFF, USE_MKL=ON, USE_MKLDNN=ON, USE_MPI=OFF, USE_NCCL=ON, USE_NNPACK=ON, USE_OPENMP=ON, USE_ROCM=OFF,
[10/31 14:52:25 detectron2]: Command line arguments: Namespace(config_file='configs/open-vocabulary/coco/vitb.yaml', resume=False, eval_only=True, num_gpus=8, num_machines=1, machine_rank=0, dist_url='auto', opts=['MODEL.WEIGHTS', 'weights/trained/open-vocabulary/coco/vitb_0079999.pth', 'DE.OFFLINE_RPN_CONFIG', 'configs/RPN/mask_rcnn_R_50_C4_1x_ovd_FSD.yaml', 'OUTPUT_DIR', 'output/eval/open-vocabulary/coco/vitb/']) [10/31 14:52:25 detectron2]: Contents of args.config_file=configs/open-vocabulary/coco/vitb.yaml: BASE: "../../Base-RCNN-C4.yaml" DE: CLASS_PROTOTYPES: "weights/initial/open-vocabulary/prototypes/coco/class_prototypes_base.vitb14.pth,weights/initial/open-vocabulary/prototypes/coco/class_prototypes_novel.vitb14.pth" BG_PROTOTYPES: "weights/initial/background/background_prototypes.vitb14.pth" BG_CLS_LOSS_WEIGHT: 0.2 TOPK: 10
MODEL: META_ARCHITECTURE: "OpenSetDetectorWithExamples" BACKBONE: NAME: "build_dino_v2_vit" TYPE: "base" WEIGHTS: "" MASK_ON: False RPN: HEAD_NAME: StandardRPNHead IN_FEATURES: ["res4"] ROI_HEADS: SCORE_THRESH_TEST: 0.001 ROI_BOX_HEAD: NAME: "" NUM_FC: 0 POOLER_RESOLUTION: 7 CLS_AGNOSTIC_BBOX_REG: True PIXEL_MEAN: [0.48145466, 0.4578275, 0.40821073] PIXEL_STD: [0.26862954, 0.26130258, 0.27577711] INPUT: MIN_SIZE_TRAIN: (640, 672, 704, 736, 768, 800) DATASETS: TRAIN: ("coco_2017_ovd_b_train",) TEST: ("coco_2017_ovd_all_test",) TEST: EVAL_PERIOD: 5000 SOLVER: IMS_PER_BATCH: 8 BASE_LR: 0.002 STEPS: (60000, 80000) MAX_ITER: 90000 WARMUP_ITERS: 5000 CHECKPOINT_PERIOD: 5000
INPUT: MIN_SIZE_TRAIN_SAMPLING: choice MIN_SIZE_TRAIN: (640, 672, 704, 736, 768, 800) MAX_SIZE_TRAIN: 1333 MIN_SIZE_TEST: 800 MAX_SIZE_TEST: 1333 FORMAT: "RGB"
[10/31 14:52:25 detectron2]: Full config saved to output/eval/open-vocabulary/coco/vitb/config.yaml [10/31 14:52:25 d2.utils.env]: Using a generated random seed 26738411 ('coco_2017_ovd_all_test',) [10/31 14:52:33 fvcore.common.checkpoint]: [Checkpointer] Loading from weights/trained/open-vocabulary/coco/vitb_0079999.pth ... [10/31 14:52:34 d2.data.datasets.coco]: Loaded 4836 images in COCO format from datasets/coco/annotations/ovd_ins_val2017_all.json [10/31 14:52:34 d2.data.build]: Distribution of instances among all 65 categories:
| category | #instances | category | #instances | category | #instances |
|---|---|---|---|---|---|
| person | 10777 | bicycle | 314 | car | 1918 |
| motorcycle | 367 | airplane | 143 | bus | 283 |
| train | 190 | truck | 414 | boat | 424 |
| bench | 411 | bird | 427 | cat | 202 |
| dog | 218 | horse | 272 | sheep | 354 |
| cow | 372 | elephant | 252 | bear | 71 |
| zebra | 266 | giraffe | 232 | backpack | 371 |
| umbrella | 407 | handbag | 540 | tie | 252 |
| suitcase | 299 | frisbee | 115 | skis | 241 |
| snowboard | 69 | kite | 327 | skateboard | 179 |
| surfboard | 267 | bottle | 1013 | cup | 895 |
| fork | 215 | knife | 325 | spoon | 253 |
| bowl | 623 | banana | 370 | apple | 236 |
| sandwich | 177 | orange | 285 | broccoli | 312 |
| carrot | 365 | pizza | 284 | donut | 328 |
| cake | 310 | chair | 1771 | couch | 261 |
| bed | 163 | toilet | 179 | tv | 288 |
| laptop | 231 | mouse | 106 | remote | 283 |
| keyboard | 153 | microwave | 55 | oven | 143 |
| toaster | 9 | sink | 225 | refrigerator | 126 |
| book | 1129 | clock | 267 | vase | 274 |
| scissors | 36 | toothbrush | 57 | ||
| total | 32721 | ||||
| [10/31 14:52:34 d2.data.dataset_mapper]: [DatasetMapper] Augmentations used in inference: [ResizeShortestEdge(short_edge_length=(800, 800), max_size=1333, sample_style='choice')] | |||||
| [10/31 14:52:34 d2.data.common]: Serializing 4836 elements to byte tensors and concatenating them all ... | |||||
| [10/31 14:52:34 d2.data.common]: Serialized dataset takes 17.62 MiB | |||||
| ('coco_2017_ovd_all_test',) | |||||
| ('coco_2017_ovd_all_test',) | |||||
| ('coco_2017_ovd_all_test',) | |||||
| [10/31 14:52:35 d2.evaluation.evaluator]: Start inference on 605 batches | |||||
| ('coco_2017_ovd_all_test',) | |||||
| ('coco_2017_ovd_all_test',) | |||||
| ('coco_2017_ovd_all_test',) | |||||
| ('coco_2017_ovd_all_test',) | |||||
| xFormers not available | |||||
| xFormers not available | |||||
| xFormers not available | |||||
| xFormers not available | |||||
| xFormers not available | |||||
| xFormers not available | |||||
| xFormers not available | |||||
| xFormers not available | |||||
| xFormers not available | |||||
| xFormers not available | |||||
| xFormers not available | |||||
| xFormers not available | |||||
| xFormers not available | |||||
| xFormers not available | |||||
| xFormers not available | |||||
| xFormers not available | |||||
| xFormers not available | |||||
| xFormers not available | |||||
| xFormers not available | |||||
| xFormers not available | |||||
| xFormers not available | |||||
| xFormers not available | |||||
| xFormers not available | |||||
| xFormers not available | |||||
| xFormers not available | |||||
| xFormers not available | |||||
| xFormers not available | |||||
| xFormers not available | |||||
| xFormers not available | |||||
| xFormers not available | |||||
| xFormers not available | |||||
| xFormers not available | |||||
| /xxx/devit/tools/../detectron2/structures/boxes.py:158: UserWarning: Creating a tensor from a list of numpy.ndarrays is extremely slow. Please consider converting the list to a single numpy.ndarray with numpy.array() before converting to a tensor. (Triggered internally at ../torch/csrc/utils/tensor_new.cpp:230.) | |||||
| tensor = torch.as_tensor(tensor, dtype=torch.float32, device=device) | |||||
| /xxx/devit/tools/../detectron2/structures/boxes.py:158: UserWarning: Creating a tensor from a list of numpy.ndarrays is extremely slow. Please consider converting the list to a single numpy.ndarray with numpy.array() before converting to a tensor. (Triggered internally at ../torch/csrc/utils/tensor_new.cpp:230.) | |||||
| tensor = torch.as_tensor(tensor, dtype=torch.float32, device=device) | |||||
| /xxx/devit/tools/../detectron2/structures/boxes.py:158: UserWarning: Creating a tensor from a list of numpy.ndarrays is extremely slow. Please consider converting the list to a single numpy.ndarray with numpy.array() before converting to a tensor. (Triggered internally at ../torch/csrc/utils/tensor_new.cpp:230.) | |||||
| tensor = torch.as_tensor(tensor, dtype=torch.float32, device=device) | |||||
| /xxx/devit/tools/../detectron2/structures/boxes.py:158: UserWarning: Creating a tensor from a list of numpy.ndarrays is extremely slow. Please consider converting the list to a single numpy.ndarray with numpy.array() before converting to a tensor. (Triggered internally at ../torch/csrc/utils/tensor_new.cpp:230.) | |||||
| tensor = torch.as_tensor(tensor, dtype=torch.float32, device=device) | |||||
| /xxx/devit/tools/../detectron2/structures/boxes.py:158: UserWarning: Creating a tensor from a list of numpy.ndarrays is extremely slow. Please consider converting the list to a single numpy.ndarray with numpy.array() before converting to a tensor. (Triggered internally at ../torch/csrc/utils/tensor_new.cpp:230.) | |||||
| tensor = torch.as_tensor(tensor, dtype=torch.float32, device=device) | |||||
| /xxx/devit/tools/../detectron2/structures/boxes.py:158: UserWarning: Creating a tensor from a list of numpy.ndarrays is extremely slow. Please consider converting the list to a single numpy.ndarray with numpy.array() before converting to a tensor. (Triggered internally at ../torch/csrc/utils/tensor_new.cpp:230.) | |||||
| tensor = torch.as_tensor(tensor, dtype=torch.float32, device=device) | |||||
| /xxx/devit/tools/../detectron2/structures/boxes.py:158: UserWarning: Creating a tensor from a list of numpy.ndarrays is extremely slow. Please consider converting the list to a single numpy.ndarray with numpy.array() before converting to a tensor. (Triggered internally at ../torch/csrc/utils/tensor_new.cpp:230.) | |||||
| tensor = torch.as_tensor(tensor, dtype=torch.float32, device=device) | |||||
| /xxx/devit/tools/../detectron2/structures/boxes.py:158: UserWarning: Creating a tensor from a list of numpy.ndarrays is extremely slow. Please consider converting the list to a single numpy.ndarray with numpy.array() before converting to a tensor. (Triggered internally at ../torch/csrc/utils/tensor_new.cpp:230.) | |||||
| tensor = torch.as_tensor(tensor, dtype=torch.float32, device=device) | |||||
| /xxx/devit/tools/../detectron2/structures/boxes.py:158: UserWarning: Creating a tensor from a list of numpy.ndarrays is extremely slow. Please consider converting the list to a single numpy.ndarray with numpy.array() before converting to a tensor. (Triggered internally at ../torch/csrc/utils/tensor_new.cpp:230.) | |||||
| tensor = torch.as_tensor(tensor, dtype=torch.float32, device=device) | |||||
| /xxx/devit/tools/../detectron2/structures/boxes.py:158: UserWarning: Creating a tensor from a list of numpy.ndarrays is extremely slow. Please consider converting the list to a single numpy.ndarray with numpy.array() before converting to a tensor. (Triggered internally at ../torch/csrc/utils/tensor_new.cpp:230.) | |||||
| tensor = torch.as_tensor(tensor, dtype=torch.float32, device=device) | |||||
| /xxx/devit/tools/../detectron2/structures/boxes.py:158: UserWarning: Creating a tensor from a list of numpy.ndarrays is extremely slow. Please consider converting the list to a single numpy.ndarray with numpy.array() before converting to a tensor. (Triggered internally at ../torch/csrc/utils/tensor_new.cpp:230.) | |||||
| tensor = torch.as_tensor(tensor, dtype=torch.float32, device=device) | |||||
| /xxx/devit/tools/../detectron2/structures/boxes.py:158: UserWarning: Creating a tensor from a list of numpy.ndarrays is extremely slow. Please consider converting the list to a single numpy.ndarray with numpy.array() before converting to a tensor. (Triggered internally at ../torch/csrc/utils/tensor_new.cpp:230.) | |||||
| tensor = torch.as_tensor(tensor, dtype=torch.float32, device=device) | |||||
| /xxx/devit/tools/../detectron2/structures/boxes.py:158: UserWarning: Creating a tensor from a list of numpy.ndarrays is extremely slow. Please consider converting the list to a single numpy.ndarray with numpy.array() before converting to a tensor. (Triggered internally at ../torch/csrc/utils/tensor_new.cpp:230.) | |||||
| tensor = torch.as_tensor(tensor, dtype=torch.float32, device=device) | |||||
| /xxx/devit/tools/../detectron2/structures/boxes.py:158: UserWarning: Creating a tensor from a list of numpy.ndarrays is extremely slow. Please consider converting the list to a single numpy.ndarray with numpy.array() before converting to a tensor. (Triggered internally at ../torch/csrc/utils/tensor_new.cpp:230.) | |||||
| tensor = torch.as_tensor(tensor, dtype=torch.float32, device=device) | |||||
| /xxx/devit/tools/../detectron2/structures/boxes.py:158: UserWarning: Creating a tensor from a list of numpy.ndarrays is extremely slow. Please consider converting the list to a single numpy.ndarray with numpy.array() before converting to a tensor. (Triggered internally at ../torch/csrc/utils/tensor_new.cpp:230.) | |||||
| tensor = torch.as_tensor(tensor, dtype=torch.float32, device=device) | |||||
| /xxx/devit/tools/../detectron2/structures/boxes.py:158: UserWarning: Creating a tensor from a list of numpy.ndarrays is extremely slow. Please consider converting the list to a single numpy.ndarray with numpy.array() before converting to a tensor. (Triggered internally at ../torch/csrc/utils/tensor_new.cpp:230.) | |||||
| tensor = torch.as_tensor(tensor, dtype=torch.float32, device=device) | |||||
| /xxx/devit/tools/../detectron2/structures/boxes.py:158: UserWarning: Creating a tensor from a list of numpy.ndarrays is extremely slow. Please consider converting the list to a single numpy.ndarray with numpy.array() before converting to a tensor. (Triggered internally at ../torch/csrc/utils/tensor_new.cpp:230.) | |||||
| tensor = torch.as_tensor(tensor, dtype=torch.float32, device=device) | |||||
| /xxx/devit/tools/../detectron2/structures/boxes.py:158: UserWarning: Creating a tensor from a list of numpy.ndarrays is extremely slow. Please consider converting the list to a single numpy.ndarray with numpy.array() before converting to a tensor. (Triggered internally at ../torch/csrc/utils/tensor_new.cpp:230.) | |||||
| tensor = torch.as_tensor(tensor, dtype=torch.float32, device=device) | |||||
| /xxx/devit/tools/../detectron2/structures/boxes.py:158: UserWarning: Creating a tensor from a list of numpy.ndarrays is extremely slow. Please consider converting the list to a single numpy.ndarray with numpy.array() before converting to a tensor. (Triggered internally at ../torch/csrc/utils/tensor_new.cpp:230.) | |||||
| tensor = torch.as_tensor(tensor, dtype=torch.float32, device=device) | |||||
| /xxx/devit/tools/../detectron2/structures/boxes.py:158: UserWarning: Creating a tensor from a list of numpy.ndarrays is extremely slow. Please consider converting the list to a single numpy.ndarray with numpy.array() before converting to a tensor. (Triggered internally at ../torch/csrc/utils/tensor_new.cpp:230.) | |||||
| tensor = torch.as_tensor(tensor, dtype=torch.float32, device=device) | |||||
| /xxx/devit/tools/../detectron2/structures/boxes.py:158: UserWarning: Creating a tensor from a list of numpy.ndarrays is extremely slow. Please consider converting the list to a single numpy.ndarray with numpy.array() before converting to a tensor. (Triggered internally at ../torch/csrc/utils/tensor_new.cpp:230.) | |||||
| tensor = torch.as_tensor(tensor, dtype=torch.float32, device=device) | |||||
| /xxx/devit/tools/../detectron2/structures/boxes.py:158: UserWarning: Creating a tensor from a list of numpy.ndarrays is extremely slow. Please consider converting the list to a single numpy.ndarray with numpy.array() before converting to a tensor. (Triggered internally at ../torch/csrc/utils/tensor_new.cpp:230.) | |||||
| tensor = torch.as_tensor(tensor, dtype=torch.float32, device=device) | |||||
| /xxx/devit/tools/../detectron2/structures/boxes.py:158: UserWarning: Creating a tensor from a list of numpy.ndarrays is extremely slow. Please consider converting the list to a single numpy.ndarray with numpy.array() before converting to a tensor. (Triggered internally at ../torch/csrc/utils/tensor_new.cpp:230.) | |||||
| tensor = torch.as_tensor(tensor, dtype=torch.float32, device=device) | |||||
| /xxx/devit/tools/../detectron2/structures/boxes.py:158: UserWarning: Creating a tensor from a list of numpy.ndarrays is extremely slow. Please consider converting the list to a single numpy.ndarray with numpy.array() before converting to a tensor. (Triggered internally at ../torch/csrc/utils/tensor_new.cpp:230.) | |||||
| tensor = torch.as_tensor(tensor, dtype=torch.float32, device=device) | |||||
| /xxx/devit/tools/../detectron2/structures/boxes.py:158: UserWarning: Creating a tensor from a list of numpy.ndarrays is extremely slow. Please consider converting the list to a single numpy.ndarray with numpy.array() before converting to a tensor. (Triggered internally at ../torch/csrc/utils/tensor_new.cpp:230.) | |||||
| tensor = torch.as_tensor(tensor, dtype=torch.float32, device=device) | |||||
| /xxx/devit/tools/../detectron2/structures/boxes.py:158: UserWarning: Creating a tensor from a list of numpy.ndarrays is extremely slow. Please consider converting the list to a single numpy.ndarray with numpy.array() before converting to a tensor. (Triggered internally at ../torch/csrc/utils/tensor_new.cpp:230.) | |||||
| tensor = torch.as_tensor(tensor, dtype=torch.float32, device=device) | |||||
| /xxx/devit/tools/../detectron2/structures/boxes.py:158: UserWarning: Creating a tensor from a list of numpy.ndarrays is extremely slow. Please consider converting the list to a single numpy.ndarray with numpy.array() before converting to a tensor. (Triggered internally at ../torch/csrc/utils/tensor_new.cpp:230.) | |||||
| tensor = torch.as_tensor(tensor, dtype=torch.float32, device=device) | |||||
| /xxx/devit/tools/../detectron2/structures/boxes.py:158: UserWarning: Creating a tensor from a list of numpy.ndarrays is extremely slow. Please consider converting the list to a single numpy.ndarray with numpy.array() before converting to a tensor. (Triggered internally at ../torch/csrc/utils/tensor_new.cpp:230.) | |||||
| tensor = torch.as_tensor(tensor, dtype=torch.float32, device=device) | |||||
| /xxx/devit/tools/../detectron2/structures/boxes.py:158: UserWarning: Creating a tensor from a list of numpy.ndarrays is extremely slow. Please consider converting the list to a single numpy.ndarray with numpy.array() before converting to a tensor. (Triggered internally at ../torch/csrc/utils/tensor_new.cpp:230.) | |||||
| tensor = torch.as_tensor(tensor, dtype=torch.float32, device=device) | |||||
| /xxx/devit/tools/../detectron2/structures/boxes.py:158: UserWarning: Creating a tensor from a list of numpy.ndarrays is extremely slow. Please consider converting the list to a single numpy.ndarray with numpy.array() before converting to a tensor. (Triggered internally at ../torch/csrc/utils/tensor_new.cpp:230.) | |||||
| tensor = torch.as_tensor(tensor, dtype=torch.float32, device=device) | |||||
| /xxx/devit/tools/../detectron2/structures/boxes.py:158: UserWarning: Creating a tensor from a list of numpy.ndarrays is extremely slow. Please consider converting the list to a single numpy.ndarray with numpy.array() before converting to a tensor. (Triggered internally at ../torch/csrc/utils/tensor_new.cpp:230.) | |||||
| tensor = torch.as_tensor(tensor, dtype=torch.float32, device=device) | |||||
| /xxx/devit/tools/../detectron2/structures/boxes.py:158: UserWarning: Creating a tensor from a list of numpy.ndarrays is extremely slow. Please consider converting the list to a single numpy.ndarray with numpy.array() before converting to a tensor. (Triggered internally at ../torch/csrc/utils/tensor_new.cpp:230.) | |||||
| tensor = torch.as_tensor(tensor, dtype=torch.float32, device=device) | |||||
| /xxx/anaconda3/envs/devit/lib/python3.9/site-packages/torch/functional.py:504: UserWarning: torch.meshgrid: in an upcoming release, it will be required to pass the indexing argument. (Triggered internally at ../aten/src/ATen/native/TensorShape.cpp:3190.) | |||||
| return _VF.meshgrid(tensors, **kwargs) # type: ignore[attr-defined] | |||||
| /xxx/anaconda3/envs/devit/lib/python3.9/site-packages/torch/functional.py:504: UserWarning: torch.meshgrid: in an upcoming release, it will be required to pass the indexing argument. (Triggered internally at ../aten/src/ATen/native/TensorShape.cpp:3190.) | |||||
| return _VF.meshgrid(tensors, **kwargs) # type: ignore[attr-defined] | |||||
| /xxx/anaconda3/envs/devit/lib/python3.9/site-packages/torch/functional.py:504: UserWarning: torch.meshgrid: in an upcoming release, it will be required to pass the indexing argument. (Triggered internally at ../aten/src/ATen/native/TensorShape.cpp:3190.) | |||||
| return _VF.meshgrid(tensors, **kwargs) # type: ignore[attr-defined] | |||||
| /xxx/anaconda3/envs/devit/lib/python3.9/site-packages/torch/functional.py:504: UserWarning: torch.meshgrid: in an upcoming release, it will be required to pass the indexing argument. (Triggered internally at ../aten/src/ATen/native/TensorShape.cpp:3190.) | |||||
| return _VF.meshgrid(tensors, **kwargs) # type: ignore[attr-defined] | |||||
| /xxx/anaconda3/envs/devit/lib/python3.9/site-packages/torch/functional.py:504: UserWarning: torch.meshgrid: in an upcoming release, it will be required to pass the indexing argument. (Triggered internally at ../aten/src/ATen/native/TensorShape.cpp:3190.) | |||||
| return _VF.meshgrid(tensors, **kwargs) # type: ignore[attr-defined] | |||||
| /xxx/anaconda3/envs/devit/lib/python3.9/site-packages/torch/functional.py:504: UserWarning: torch.meshgrid: in an upcoming release, it will be required to pass the indexing argument. (Triggered internally at ../aten/src/ATen/native/TensorShape.cpp:3190.) | |||||
| return _VF.meshgrid(tensors, **kwargs) # type: ignore[attr-defined] | |||||
| /xxx/anaconda3/envs/devit/lib/python3.9/site-packages/torch/functional.py:504: UserWarning: torch.meshgrid: in an upcoming release, it will be required to pass the indexing argument. (Triggered internally at ../aten/src/ATen/native/TensorShape.cpp:3190.) | |||||
| return _VF.meshgrid(tensors, **kwargs) # type: ignore[attr-defined] | |||||
| /xxx/anaconda3/envs/devit/lib/python3.9/site-packages/torch/functional.py:504: UserWarning: torch.meshgrid: in an upcoming release, it will be required to pass the indexing argument. (Triggered internally at ../aten/src/ATen/native/TensorShape.cpp:3190.) | |||||
| return _VF.meshgrid(tensors, **kwargs) # type: ignore[attr-defined] | |||||
| Traceback (most recent call last): | |||||
| File "/xxx/devit/tools/train_net.py", line 202, in |
launch(
File "/xxx/devit/tools/../detectron2/engine/launch.py", line 67, in launch mp.spawn( File "/xxx/anaconda3/envs/devit/lib/python3.9/site-packages/torch/multiprocessing/spawn.py", line 240, in spawn return start_processes(fn, args, nprocs, join, daemon, start_method='spawn') File "/xxx/anaconda3/envs/devit/lib/python3.9/site-packages/torch/multiprocessing/spawn.py", line 198, in start_processes while not context.join(): File "/xxx/anaconda3/envs/devit/lib/python3.9/site-packages/torch/multiprocessing/spawn.py", line 160, in join raise ProcessRaisedException(msg, error_index, failed_process.pid) torch.multiprocessing.spawn.ProcessRaisedException:
-- Process 6 terminated with the following error: Traceback (most recent call last): File "/xxx/anaconda3/envs/devit/lib/python3.9/site-packages/torch/multiprocessing/spawn.py", line 69, in _wrap fn(i, *args) File "/xxx/devit/tools/../detectron2/engine/launch.py", line 125, in _distributed_worker main_func(*args) File "/xxx/devit/tools/train_net.py", line 178, in main res = Trainer.test(cfg, model) File "/xxx/devit/tools/../detectron2/engine/defaults.py", line 618, in test results_i = inference_on_dataset(model, data_loader, evaluator) File "/xxx/devit/tools/../detectron2/evaluation/evaluator.py", line 159, in inference_on_dataset outputs = model(inputs) File "/xxx/anaconda3/envs/devit/lib/python3.9/site-packages/torch/nn/modules/module.py", line 1194, in _call_impl return forward_call(*input, **kwargs) File "/xxx/devit/tools/../detectron2/modeling/meta_arch/devit.py", line 1309, in forward embedding = rp(embedding) File "/xxx/anaconda3/envs/devit/lib/python3.9/site-packages/torch/nn/modules/module.py", line 1194, in _call_impl return forward_call(*input, **kwargs) File "/xxx/anaconda3/envs/devit/lib/python3.9/site-packages/torch/nn/modules/container.py", line 204, in forward input = module(input) File "/xxx/anaconda3/envs/devit/lib/python3.9/site-packages/torch/nn/modules/module.py", line 1194, in _call_impl return forward_call(*input, **kwargs) File "/xxx/anaconda3/envs/devit/lib/python3.9/site-packages/torch/nn/modules/batchnorm.py", line 171, in forward return F.batch_norm( File "/xxx/anaconda3/envs/devit/lib/python3.9/site-packages/torch/nn/functional.py", line 2450, in batch_norm return torch.batch_norm( torch.cuda.OutOfMemoryError: CUDA out of memory. Tried to allocate 3.82 GiB (GPU 6; 23.69 GiB total capacity; 19.57 GiB already allocated; 2.44 GiB free; 19.84 GiB reserved in total by PyTorch) If reserved memory is >> allocated memory try setting max_split_size_mb to avoid fragmentation. See documentation for Memory Management and PYTORCH_CUDA_ALLOC_CONF
/xxx/anaconda3/envs/devit/lib/python3.9/multiprocessing/resource_tracker.py:216: UserWarning: resource_tracker: There appear to be 140 leaked semaphore objects to clean up at shutdown warnings.warn('resource_tracker: There appear to be %d '
Hi, @study-hard-forever for the first problem, I would suggest reduce K from 10 to 5 or 3. It shouldn't hurt accuracy too much according to the ablation study but will significantly reduce memory requirements. The fact is that because of turning C-classification into C binary classification, the space complexity (on a single GPU) of RCNN branch becomes roughly $O(K^2)$.
For the second problem, I also had that a lot before. The solution could be the steps followed:
- Use
torch 1.x(e.g.1.13) instead oftorch2 - Install xformers 0.0.18 from source (have to because it is not available from pypi). Then the memory consumption of ViT will be reduced a lot, and the warnings will be mostly gone.
Note that when I use certain versions of xformers on some GPU, e.g., 2080 Ti I remember, it sometimes produces inaccurate inference results. Don't know why but I would recommend you install this version of xformers from source.
commit da278628761928581f52b81647d82236a88c3afb (HEAD, tag: v0.0.18)
Thanks for your reply. I remember I also had some problems yesterday when installing the 0.0.18 version from source, I'll try it again later, I'll also turn the K down a bit to try, thank you.
Note that when I use certain versions of xformers on some GPU, e.g., 2080 Ti I remember, it sometimes produces inaccurate inference results. Don't know why but I would recommend you install this version of xformers from source.
commit da278628761928581f52b81647d82236a88c3afb (HEAD, tag: v0.0.18)
Install xformers 0.0.18 from this source works on V 100, GOOD!
Note that when I use certain versions of xformers on some GPU, e.g., 2080 Ti I remember, it sometimes produces inaccurate inference results. Don't know why but I would recommend you install this version of xformers from source.
commit da278628761928581f52b81647d82236a88c3afb (HEAD, tag: v0.0.18)Install xformers 0.0.18 from this source works on V 100, GOOD!
Ok, thank you for your reply. If there is a chance in the future, I will try it on V100.
Hi, @study-hard-forever for the first problem, I would suggest reduce
Kfrom10to5or3. It shouldn't hurt accuracy too much according to the ablation study but will significantly reduce memory requirements. The fact is that because of turning C-classification into C binary classification, the space complexity (on a single GPU) of RCNN branch becomes roughly O(K2).For the second problem, I also had that a lot before. The solution could be the steps followed:
- Use
torch 1.x(e.g.1.13) instead oftorch2- Install xformers 0.0.18 from source (have to because it is not available from pypi). Then the memory consumption of ViT will be reduced a lot, and the warnings will be mostly gone.
Just reduce TOPK from 10 to 3, but I don't know if this will reduce the accuracy of the model.