pytorch icon indicating copy to clipboard operation
pytorch copied to clipboard

torch.cuda.memory_stats returns all 0s

Open ericjang opened this issue 5 years ago • 2 comments

🐛 Bug

Calling torch.cuda.memory_stats on gfx900 GPU (Frontier Vega) or any of the methods in https://pytorch.org/docs/stable/cuda.html#memory-management results in 0s.

To Reproduce

Steps to reproduce the behavior:

  1. Run rocm/pytorch 3.3
  2. in python, start training a model on GPU
  3. In a separate python process,
import torch
torch.cuda.memory_stats(0)

will return all 0s. Output:

OrderedDict([('active.all.allocated', 0),
             ('active.all.current', 0),
             ('active.all.freed', 0),
             ('active.all.peak', 0),
             ('active.large_pool.allocated', 0),
             ('active.large_pool.current', 0),
             ('active.large_pool.freed', 0),
             ('active.large_pool.peak', 0),
             ('active.small_pool.allocated', 0),
             ('active.small_pool.current', 0),
             ('active.small_pool.freed', 0),
             ('active.small_pool.peak', 0),
             ('active_bytes.all.allocated', 0),
             ('active_bytes.all.current', 0),
             ('active_bytes.all.freed', 0),
             ('active_bytes.all.peak', 0),
             ('active_bytes.large_pool.allocated', 0),
             ('active_bytes.large_pool.current', 0),
             ('active_bytes.large_pool.freed', 0),
             ('active_bytes.large_pool.peak', 0),
             ('active_bytes.small_pool.allocated', 0),
             ('active_bytes.small_pool.current', 0),
             ('active_bytes.small_pool.freed', 0),
             ('active_bytes.small_pool.peak', 0),
             ('allocated_bytes.all.allocated', 0),
             ('allocated_bytes.all.current', 0),
             ('allocated_bytes.all.freed', 0),
             ('allocated_bytes.all.peak', 0),
             ('allocated_bytes.large_pool.allocated', 0),
             ('allocated_bytes.large_pool.current', 0),
             ('allocated_bytes.large_pool.freed', 0),
             ('allocated_bytes.large_pool.peak', 0),
             ('allocated_bytes.small_pool.allocated', 0),
             ('allocated_bytes.small_pool.current', 0),
             ('allocated_bytes.small_pool.freed', 0),
             ('allocated_bytes.small_pool.peak', 0),
             ('allocation.all.allocated', 0),
             ('allocation.all.current', 0),
             ('allocation.all.freed', 0),
             ('allocation.all.peak', 0),
             ('allocation.large_pool.allocated', 0),
             ('allocation.large_pool.current', 0),
             ('allocation.large_pool.freed', 0),
             ('allocation.large_pool.peak', 0),
             ('allocation.small_pool.allocated', 0),
             ('allocation.small_pool.current', 0),
             ('allocation.small_pool.freed', 0),
             ('allocation.small_pool.peak', 0),
             ('inactive_split.all.allocated', 0),
             ('inactive_split.all.current', 0),
             ('inactive_split.all.freed', 0),
             ('inactive_split.all.peak', 0),
             ('inactive_split.large_pool.allocated', 0),
             ('inactive_split.large_pool.current', 0),
             ('inactive_split.large_pool.freed', 0),
             ('inactive_split.large_pool.peak', 0),
             ('inactive_split.small_pool.allocated', 0),
             ('inactive_split.small_pool.current', 0),
             ('inactive_split.small_pool.freed', 0),
             ('inactive_split.small_pool.peak', 0),
             ('inactive_split_bytes.all.allocated', 0),
             ('inactive_split_bytes.all.current', 0),
             ('inactive_split_bytes.all.freed', 0),
             ('inactive_split_bytes.all.peak', 0),
             ('inactive_split_bytes.large_pool.allocated', 0),
             ('inactive_split_bytes.large_pool.current', 0),
             ('inactive_split_bytes.large_pool.freed', 0),
             ('inactive_split_bytes.large_pool.peak', 0),
             ('inactive_split_bytes.small_pool.allocated', 0),
             ('inactive_split_bytes.small_pool.current', 0),
             ('inactive_split_bytes.small_pool.freed', 0),
             ('inactive_split_bytes.small_pool.peak', 0),
             ('num_alloc_retries', 0),
             ('num_ooms', 0),
             ('reserved_bytes.all.allocated', 0),
             ('reserved_bytes.all.current', 0),
             ('reserved_bytes.all.freed', 0),
             ('reserved_bytes.all.peak', 0),
             ('reserved_bytes.large_pool.allocated', 0),
             ('reserved_bytes.large_pool.current', 0),
             ('reserved_bytes.large_pool.freed', 0),
             ('reserved_bytes.large_pool.peak', 0),
             ('reserved_bytes.small_pool.allocated', 0),
             ('reserved_bytes.small_pool.current', 0),
             ('reserved_bytes.small_pool.freed', 0),
             ('reserved_bytes.small_pool.peak', 0),
             ('segment.all.allocated', 0),
             ('segment.all.current', 0),
             ('segment.all.freed', 0),
             ('segment.all.peak', 0),
             ('segment.large_pool.allocated', 0),
             ('segment.large_pool.current', 0),
             ('segment.large_pool.freed', 0),
             ('segment.large_pool.peak', 0),
             ('segment.small_pool.allocated', 0),
             ('segment.small_pool.current', 0),
             ('segment.small_pool.freed', 0),
             ('segment.small_pool.peak', 0)])

Expected behavior

Even though I'm using an AMD GPU I expect memory stats to have an AMD analogue that can be reported in torch.cuda.memory_stats

ericjang avatar Aug 02 '20 06:08 ericjang

I tried this recently on ROCm 3.3 by printing torch.cuda.memory_stats from the same process by instrumenting the python code, and it worked. Is there a reason you're trying to print it from a different process?

jithunnair-amd avatar Aug 02 '20 13:08 jithunnair-amd

Hi @ericjang can you reply to @jithunnair-amd 's query? Please let us know if this issue is still reproducible on your end, thanks.

sunway513 avatar Mar 01 '21 16:03 sunway513