FastMaskRCNN icon indicating copy to clipboard operation
FastMaskRCNN copied to clipboard

After runing train.py for 3 hours, GPU is out of memory trying to allocate 146.88MiB

Open mattdingmeng opened this issue 8 years ago • 9 comments

After three hours training, the GPU ran out of memory trying to allocate some memory. I have totally 6 GB in my Titan black GPU. I am wondering is it a bug in the code or I have to upgrade my GPU?

mattdingmeng avatar May 23 '17 14:05 mattdingmeng

@mattdingmeng Hi, Can you tell me, how do you fix this problems,when 'python trian.py',I have meet the same problems! thank you very much 63865893-92ba-4ba4-b709-dfa21a417eec

zhanglijian avatar May 24 '17 06:05 zhanglijian

@zhanglijian I have not fix the out of memory problem yet. I use Titan Black 6GB.

mattdingmeng avatar May 24 '17 14:05 mattdingmeng

@mattdingmeng Hi, my problems like this . d147834a-a97d-45d6-8d2b-3d0520c5b0ef my coco data placed lieke this: 3d08e450-7f1b-40b3-ac1b-0ffb2fde6ca9

zhanglijian avatar May 25 '17 06:05 zhanglijian

@zhanglijian you need to adjust the path of these data in the code. Find the path and revise it to your path.

mattdingmeng avatar May 26 '17 15:05 mattdingmeng

@mattdingmeng Thanks a lot

zhanglijian avatar May 27 '17 06:05 zhanglijian

I also has the same out of memory problem, my gpu is 12G. The log message: iter 13826: image-id:0485080, time:0.440(sec), regular_loss: 0.156694, total-loss 0.1656(0.0106, 0.1222, 0.000000, 0.0328, 0.0000), instances: 1, batch:(1|33, 0|64, 0|0) iter 13827: image-id:0211918, time:0.545(sec), regular_loss: 0.156691, total-loss 0.5308(0.0438, 0.2778, 0.008190, 0.1686, 0.0324), instances: 6, batch:(38|160, 13|77, 13|13) out of memory invalid argument an illegal memory access was encountered an illegal memory access was encountered 2017-05-25 16:36:43.037814: E tensorflow/stream_executor/cuda/cuda_event.cc:49] Error polling for event status: failed to query event: CUDA_ERROR_ILLEGAL_ADDRESS 2017-05-25 16:36:43.037864: F tensorflow/core/common_runtime/gpu/gpu_event_mgr.cc:203] Unexpected Event status: 1

guoyilin avatar Jun 05 '17 11:06 guoyilin

p100, 16G, when 460 thousands iter, meet the same out of memory error.

sevenseablue avatar Aug 17 '17 10:08 sevenseablue

@guoyilin , have you solve it?

sevenseablue avatar Aug 17 '17 10:08 sevenseablue

No.发自我的iPhone------------------ Original ------------------From: sevenseablue [email protected]Date: 周四,8月 17,2017 18:21To: CharlesShang/FastMaskRCNN [email protected]Cc: Alan Guo [email protected], Mention [email protected]Subject: Re: [CharlesShang/FastMaskRCNN] After runing train.py for 3 hours,GPU is out of memory trying to allocate 146.88MiB (#61)@guoyilin , have you solve it?

—You are receiving this because you were mentioned.Reply to this email directly, view it on GitHub, or mute the thread.

{"api_version":"1.0","publisher":{"api_key":"05dde50f1d1a384dd78767c55493e4bb","name":"GitHub"},"entity":{"external_key":"github/CharlesShang/FastMaskRCNN","title":"CharlesShang/FastMaskRCNN","subtitle":"GitHub repository","main_image_url":"https://cloud.githubusercontent.com/assets/143418/17495839/a5054eac-5d88-11e6-95fc-7290892c7bb5.png","avatar_image_url":"https://cloud.githubusercontent.com/assets/143418/15842166/7c72db34-2c0b-11e6-9aed-b52498112777.png","action":{"name":"Open in GitHub","url":"https://github.com/CharlesShang/FastMaskRCNN"}},"updates":{"snippets":[{"icon":"PERSON","message":"@sevenseablue in #61: @guoyilin , have you solve it?"}],"action":{"name":"View Issue","url":"https://github.com/CharlesShang/FastMaskRCNN/issues/61#issuecomment-323029714"}}}

guoyilin avatar Aug 18 '17 08:08 guoyilin