TorchSharp icon indicating copy to clipboard operation
TorchSharp copied to clipboard

Add empty_cache for releasing GPU memory

Open sao2c opened this issue 3 years ago • 11 comments

When I run the code sample below, my GPU monitoring seems to show that the memory that was allocated during execution (in a jupyter notebook) is still allocated despite the fact that I've done all I know to do to have intermediate tensors disposed

#r "nuget:TorchSharp-cuda-linux"
#r "nuget:TorchSharp"

open TorchSharp

let test () =
     use d = torch.NewDisposeScope()
     use tt = torch.randn(50000,50000,device=torch.device("cuda:7"))
     tt.MoveToOuterDisposeScope()

let test2() =
     use d2 = torch.NewDisposeScope()
     use ttt = test()
     ()

let empty_result = test2()

I get a similar result when I do the same experiment in python with pytorch

import torch

def test():
     tt = torch.randn(50000,50000,device=torch.device('cuda:7'))
     return ttt

def test2():
     ttt = test()
     return 0

empty_result = test2()

but I can free the memory by calling torch.cuda.empty_cache(). @NiklasGustafsson says

The underlying library does keep a high-watermark of allocated GPU memory, so even when you dispose of tensors, the overall allocation won't necessary go down. I'll see how I can get empty_cache() implemented

sao2c avatar Jan 18 '23 18:01 sao2c

After some digging, I have found the function that would be needed to implement 'empty_cache()' it's exported from torch_cuda_cpp.dll/so. We don't statically link against this library when the native component of TorchSharp is built, so we would have to find it at runtime by going looking for the DLL and using the mangled name to import the function. Perfectly doable, but very ugly code.

I'll keep it on the backlog, but it's probably not going to be the highest priority.

NiklasGustafsson avatar Jan 19 '23 22:01 NiklasGustafsson

Thanks, Niklas, for running that down. Does sound like the solution would be pretty gnarly. We're able to do what we need to do as things are now so we'll be glad to have that feature if it ever comes out but we're not in a hurry for it.

BTW - I'm pretty new to TorchSharp but I'm really enjoying working in it. Thanks so much for all you do to make it happen!

sao2c avatar Jan 20 '23 17:01 sao2c

@NiklasGustafsson, Looks like we need to port this code to LibTorchSharp first

ChengYen-Tang avatar Aug 09 '23 03:08 ChengYen-Tang

@NiklasGustafsson, Looks like we need to port this code to LibTorchSharp first

That's the CUDA backend (or parts of it), porting it would mean duplication.

It would be better to hack it by dynamically loading and finding the entry in the backend when we know we're loading the CUDA backend (in Torch.cs). It will have a mangled name, which complicates things, since the schemes will differ between compiler.

NiklasGustafsson avatar Aug 09 '23 03:08 NiklasGustafsson

As stated above:

After some digging, I have found the function that would be needed to implement 'empty_cache()' it's exported from torch_cuda_cpp.dll/so. We don't statically link against this library when the native component of TorchSharp is built, so we would have to find it at runtime by going looking for the DLL and using the mangled name to import the function. Perfectly doable, but very ugly code.

NiklasGustafsson avatar Aug 09 '23 03:08 NiklasGustafsson

@NiklasGustafsson, Looks like we need to port this code to LibTorchSharp first

That's the CUDA backend (or parts of it), porting it would mean duplication.

It would be better to hack it by dynamically loading and finding the entry in the backend when we know we're loading the CUDA backend (in Torch.cs). It will have a mangled name, which complicates things, since the schemes will differ between compiler.

I think this is like a bridge between the Cuda backend and python. Because we just open a file, we can see the PyObject. https://pytorch.org/docs/stable/_modules/torch/cuda/memory.html#empty_cache https://github.com/pytorch/pytorch/blob/main/torch/_C/init.pyi.in#L1545 https://github.com/pytorch/pytorch/blob/main/torch/csrc/cuda/Module.cpp#L1422

Cuda backend is here. https://github.com/pytorch/pytorch/blob/main/torch/csrc/api/include/torch/cuda.h https://github.com/pytorch/pytorch/blob/main/torch/csrc/api/src/cuda.cpp image

ChengYen-Tang avatar Aug 09 '23 03:08 ChengYen-Tang

Is there a workaround for this?

dennisbromley avatar Jun 20 '24 20:06 dennisbromley

Nope. Fixing it will require us changing how we build TorchSharp, since the native entry point is not in the backend-independent libtorch C API. That will require more engineering resources than we have assigned.

NiklasGustafsson avatar Jun 20 '24 20:06 NiklasGustafsson

Any fix or workaround for the torch.cuda.empty_cache() so far?

robtad avatar Feb 24 '25 08:02 robtad

Workaround is to load native binaries yourself and run the empty_cache method.

Like it has been done here for example: https://github.com/K024/llm-sharp

K1T00 avatar Feb 24 '25 10:02 K1T00

Thank you for your response @K1T00. I will check it out.

robtad avatar Feb 24 '25 14:02 robtad