pytorch-lightning icon indicating copy to clipboard operation
pytorch-lightning copied to clipboard

PyTorchProfiler: not showing CPU memory used even with `profile_memory=True`

Open Jack12xl opened this issue 1 year ago • 0 comments

Bug description

Trying to use PyTorchProfiler (https://lightning.ai/docs/pytorch/stable/api/lightning.pytorch.profilers.PyTorchProfiler.html) to track some OOM(cpu memory) issues. I go with

profiler = PyTorchProfiler(
        dirpath=log_dir,  # Directory to save logs
        filename="memory_profile",  # Name of the file to save results
        sort_by_key="self_cpu_memory_usage",  # Sort by CPU memory usage
        export_to_chrome=True,  # Export as JSON for Chrome
        row_limit=16,
        activities=[torch.profiler.ProfilerActivity.CPU],
        profile_memory=True,  # Record CPU memory usage
        with_stack=True,
        record_shapes=True,
    )

trainer=pl.Trainer(..., profiler=profiler, ...)
trainer.fit()

I expected to results similar to Pytorch Native profiler(https://pytorch.org/tutorials/recipes/recipes/profiler_recipe.html#using-profiler-to-analyze-memory-consumption). But it's still outputting cpu/gpu time like here: image

I don't know if this is a bug(I thought PyTorchProfiler was a wrapper around native Pytorch Profiler, so it should have similar behavior when I set profile_memory=True).

Thanks! Please correct me if I am wrong!

What version are you seeing the problem on?

v2.4

How to reproduce the bug

profiler = PyTorchProfiler( dirpath=log_dir, # Directory to save logs filename="memory_profile", # Name of the file to save results sort_by_key="self_cpu_memory_usage", # Sort by CPU memory usage export_to_chrome=True, # Export as JSON for Chrome row_limit=16, activities=[torch.profiler.ProfilerActivity.CPU], profile_memory=True, # Record CPU memory usage with_stack=True, record_shapes=True, )

trainer=pl.Trainer(..., profiler=profiler, ...) trainer.fit()

Error messages and logs

# Error messages and logs here please

Environment

Current environment
* CUDA:
	- GPU:
		- NVIDIA 30xx GPU
	- available:         True
	- version:           12.1
* Lightning:
	- lightning:         2.4.0
	- lightning-utilities: 0.11.7
	- pytorch-lightning: 2.4.0
	- torch:             2.3.1
	- torchaudio:        2.3.1
	- torchdata:         0.8.0
	- torchmetrics:      1.4.1
	- torchvision:       0.18.1

Python: 3.12.4

More info

It's not directly related to this issue. But is there some way I could have the export_memory_timeline(https://pytorch.org/docs/main/profiler.html#torch.profiler._KinetoProfile.export_memory_timeline) behavior with lightning PytorchProfiler?

Thanks!

Jack12xl avatar Oct 13 '24 01:10 Jack12xl