TorchSharp icon indicating copy to clipboard operation
TorchSharp copied to clipboard

Fix torch.jit.ScriptModule.zero_grad.

Open hiyuh opened this issue 8 months ago • 2 comments

TorchSharp 0.105.0 doesn't have torch.jit.ScriptModule.zero_grad and falls back into torch.nn.Module.zero_grad incorrectly, then terminates silently. Most probably, because JITModule is not compatible to NNModule in LibTorchSharp.

And as reported in https://github.com/pytorch/pytorch/issues/27144, libtorch also doesn't have torch::jit::Module::zero_grad. As a workaround, manually loop over the parameters and zero them out like optimizer does.

Note;

  • intentionally, omit RELEASENOTES.md update ATM.
    • due to avoid multiple conflict&rebase annoyance while MR review.
    • i'll update later, before merging this MR, if upstream prefers.
  • i'm not sure whether foreach loop of ScriptModule.zero_grad in src/TorchSharp/JIT/ScriptModule.cs is actually needed.
    • this is just mimicking what Module.zero_grad in src/TorchSharp/NN/Module.cs does.

hiyuh avatar May 30 '25 06:05 hiyuh

Hey @hiyuh this looks okay to me, can you do two things: merge the latest changes from main and add a line in the releasenotes (make it NuGet Version 0.105.2, altough we might change that), under an API Changes section specifying that you introduced this?

alinpahontu2912 avatar Jul 02 '25 09:07 alinpahontu2912

@alinpahontu2912

  • rebased & updated RELEASENOTES.md as usual.
  • i dunno why only Windows_x64_NetFX Release_Build failed.
    • most probably, b/c network failure in Azure DevOps pipeline?
    • update: the failure is gone, i dunno why...

hiyuh avatar Jul 03 '25 00:07 hiyuh