Results 17 comments of Dimitri

@dotnet-policy-service agree

Do not merge, i keep have some issue.

> @haytham2597 -- just a gentle ping! I think this PR would be very valuable, but it's still a draft, and thus I will not merge it. I also had...

About AMP or Autocast, @NiklasGustafsson do you have any idea what the "only" (or more abstraction) method is to obtain the tensor? Because in autocast for example, inner-scope on Autocast...

Hi, the last commit `AMP Problem outscope` have problem outscope, in this eample code: ```csharp var cast = AMPManager.GetInstance(); var b = torch.rand(new long[] { 3, 3 }, torch.ScalarType.Float32, device:...

@NiklasGustafsson Yes I trying change the dtype of B. But i think is not bad my code, because of [Cuda OPS](https://pytorch.org/docs/stable/amp.html#cuda-ops-that-can-autocast-to-float16) and the example of [§4] A few hours ago...

> This PR is still labeled 'Draft' -- how close do you think you're getting to having it ready to review and merge? I am closest but not enough. I...

Is interesting the performance that you have in matmul so i replicate them with Release mode in TorchSharp and LibTorch 2.8.0 cu128. As always because of cache the first time...

I does more test about matmul and notice the difference between TorchSharp and Pytorch is around of ~22 microseconds (0.022 miliseconds) in favor of Pytorch and at least i know...

> Hey @haytham2597, are you still intending to work on this PR? This PR is already Finish. At least i know is worked and is fast.