TorchSharp Add empty_cache for releasing GPU memory

When I run the code sample below, my GPU monitoring seems to show that the memory that was allocated during execution (in a jupyter notebook) is still allocated despite the fact that I've done all I know to do to have intermediate tensors disposed

#r "nuget:TorchSharp-cuda-linux"
#r "nuget:TorchSharp"

open TorchSharp

let test () =
     use d = torch.NewDisposeScope()
     use tt = torch.randn(50000,50000,device=torch.device("cuda:7"))
     tt.MoveToOuterDisposeScope()

let test2() =
     use d2 = torch.NewDisposeScope()
     use ttt = test()
     ()

let empty_result = test2()

I get a similar result when I do the same experiment in python with pytorch

import torch

def test():
     tt = torch.randn(50000,50000,device=torch.device('cuda:7'))
     return ttt

def test2():
     ttt = test()
     return 0

empty_result = test2()

but I can free the memory by calling torch.cuda.empty_cache(). @NiklasGustafsson says

The underlying library does keep a high-watermark of allocated GPU memory, so even when you dispose of tensors, the overall allocation won't necessary go down. I'll see how I can get empty_cache() implemented

Jan 18 '23 18:01 sao2c

After some digging, I have found the function that would be needed to implement 'empty_cache()' it's exported from torch_cuda_cpp.dll/so. We don't statically link against this library when the native component of TorchSharp is built, so we would have to find it at runtime by going looking for the DLL and using the mangled name to import the function. Perfectly doable, but very ugly code.

I'll keep it on the backlog, but it's probably not going to be the highest priority.

Jan 19 '23 22:01 NiklasGustafsson

Thanks, Niklas, for running that down. Does sound like the solution would be pretty gnarly. We're able to do what we need to do as things are now so we'll be glad to have that feature if it ever comes out but we're not in a hurry for it.

BTW - I'm pretty new to TorchSharp but I'm really enjoying working in it. Thanks so much for all you do to make it happen!

Jan 20 '23 17:01 sao2c

@NiklasGustafsson, Looks like we need to port this code to LibTorchSharp first

Aug 09 '23 03:08 ChengYen-Tang

@NiklasGustafsson, Looks like we need to port this code to LibTorchSharp first

That's the CUDA backend (or parts of it), porting it would mean duplication.

It would be better to hack it by dynamically loading and finding the entry in the backend when we know we're loading the CUDA backend (in Torch.cs). It will have a mangled name, which complicates things, since the schemes will differ between compiler.

Aug 09 '23 03:08 NiklasGustafsson

As stated above:

After some digging, I have found the function that would be needed to implement 'empty_cache()' it's exported from torch_cuda_cpp.dll/so. We don't statically link against this library when the native component of TorchSharp is built, so we would have to find it at runtime by going looking for the DLL and using the mangled name to import the function. Perfectly doable, but very ugly code.

Aug 09 '23 03:08 NiklasGustafsson

@NiklasGustafsson, Looks like we need to port this code to LibTorchSharp first

That's the CUDA backend (or parts of it), porting it would mean duplication.

It would be better to hack it by dynamically loading and finding the entry in the backend when we know we're loading the CUDA backend (in Torch.cs). It will have a mangled name, which complicates things, since the schemes will differ between compiler.

I think this is like a bridge between the Cuda backend and python. Because we just open a file, we can see the PyObject. https://pytorch.org/docs/stable/_modules/torch/cuda/memory.html#empty_cache https://github.com/pytorch/pytorch/blob/main/torch/_C/init.pyi.in#L1545 https://github.com/pytorch/pytorch/blob/main/torch/csrc/cuda/Module.cpp#L1422

Cuda backend is here. https://github.com/pytorch/pytorch/blob/main/torch/csrc/api/include/torch/cuda.h https://github.com/pytorch/pytorch/blob/main/torch/csrc/api/src/cuda.cpp

Aug 09 '23 03:08 ChengYen-Tang

Is there a workaround for this?

Jun 20 '24 20:06 dennisbromley

Nope. Fixing it will require us changing how we build TorchSharp, since the native entry point is not in the backend-independent libtorch C API. That will require more engineering resources than we have assigned.

Jun 20 '24 20:06 NiklasGustafsson

Any fix or workaround for the torch.cuda.empty_cache() so far?

Feb 24 '25 08:02 robtad

Workaround is to load native binaries yourself and run the empty_cache method.

Like it has been done here for example: https://github.com/K024/llm-sharp

Feb 24 '25 10:02 K1T00

Thank you for your response @K1T00. I will check it out.

Feb 24 '25 14:02 robtad