TorchSharp Autocast

Soon i will try make AMP (Automatic Mixed Precision) with GradScaler.

Feb 11 '24 19:02 haytham2597

@dotnet-policy-service agree

Feb 11 '24 19:02 haytham2597

@haytham2597 -- thank you for your first PR! Much appreciated. Please see the comment I made in the review.

Feb 12 '24 15:02 NiklasGustafsson

Do not merge, i keep have some issue.

Feb 18 '24 18:02 haytham2597

Lots of errors in the build on everything except the .NET FX builds (which don't have System.Range):

https://dev.azure.com/dotnet/TorchSharp/_build/results?buildId=103093&view=logs&j=80b813b5-9a08-5859-11a8-dc0e5b556e52&t=d3977768-5d05-5555-eccf-169680cb7093

Feb 20 '24 17:02 NiklasGustafsson

I am very happy to see this proposal.

Apr 08 '24 07:04 HCareLou

@haytham2597 -- just a gentle ping! I think this PR would be very valuable, but it's still a draft, and thus I will not merge it. I also had some comments in my review.

Apr 18 '24 20:04 NiklasGustafsson

@haytham2597 -- just a gentle ping! I think this PR would be very valuable, but it's still a draft, and thus I will not merge it. I also had some comments in my review.

Yeah, but sorry i am very busy with studied and work. I need managed very well about my time for making some progress on this pull requests, i mean this is very useful for me too. But i can provide some idea about this if you want continue.

While the autocast is inside on scope automatically convert the tensor to dtype of autocast. For example

torch.Tensor a;
using(var ac = torch.NewAutocast()){
      torch.Tensor b = a;
      torch.Tensor c = torch.arange(...)
}

The b and c should automatically converted to float16 (if that is dtype of mixed precision from f32) including all weight/bias of modules that found inside i mean the module, example: ResNet should passed to mixed precision.

The idea Is very similar that you do with

using (var d = torch.NewDisposeScope())

And in outer scope need back to original dtype. Because the neural should backward with original dtype (on my understood) With my external THS_Autocast u can determine the dtype that should passed/work and if is enabled/disabled too I don't know if I explained myself correctly, but feel free to ask.

Apr 19 '24 02:04 haytham2597

Yeah, no pressure!

We all have other things to do, so I understand completely. Just wanted to let you know we haven't forgotten about your work, and that it will be appreciated, if and when you find time.

Apr 19 '24 16:04 NiklasGustafsson

I would also like to see this completed. It should help with #1136 as well.

Jun 10 '24 14:06 GilesBathgate

Really need this!! Thank you!!

Jun 19 '24 16:06 ingted

About AMP or Autocast, @NiklasGustafsson do you have any idea what the "only" (or more abstraction) method is to obtain the tensor? Because in autocast for example, inner-scope on Autocast should all tensors pass to Float16, So the problem is Tensor have so much operation (ie: sum, prod, some linalg, div, etc.) And i should in every method cast the tensor to specific ScalarType. But I want to see where is one method for that, I thinking about using the IntPtr of Tensor and each call of this (because some method uses that, like prod, sum, etc use that IntPtr) and casting to that ScalarType. Is best idea work with IntPtr tensor right?

P.D: I don't know why i can Compile but cannot run Test so rare.

Jul 02 '24 21:07 haytham2597

Hi, the last commit AMP Problem outscope have problem outscope, in this eample code:

var cast = AMPManager.GetInstance();
var b = torch.rand(new long[] { 3, 3 }, torch.ScalarType.Float32, device: new torch.Device(DeviceType.CUDA));
Debug.Assert(b.dtype == torch.ScalarType.Float32, "b.dtype == torch.ScalarType.Float32"); //OK
using (cast.Enter())
{
    b = b.mul(1);
    Debug.Assert(b.dtype == torch.ScalarType.Float16, "b.dtype == torch.ScalarType.Float16"); //OK
    var a = torch.rand(new long[] { 3, 3 }, torch.ScalarType.Float32, device:new torch.Device(DeviceType.CUDA));
    Console.WriteLine($"A: {a.dtype}"); //OK Print Float16
}
//Debug.Assert(b.dtype == torch.ScalarType.Float32, "b.dtype == torch.ScalarType.Float32");
Console.WriteLine($"B: {b.dtype}"); //BAD: This print Float16 instaed of Float32
b = b.mul(2);
Console.WriteLine($"B: {b.dtype}"); //BAD: This print Float16 instaed of Float32
var c = torch.rand(new long[] { 3, 3 }, torch.ScalarType.Float32, device: new torch.Device(DeviceType.CUDA));
Console.WriteLine($"C: {c.dtype}"); //OK Print Float32

After using cast.Enter() the tensor B is not converting to Float32 (Original ScalarType of B is Float32, but cant revert this, i don know why) So right now i have one problem, cant "uncasting" that tensor. All Work well except when i used a tensor inside of scope and when go to outside cause problem with variable B, somebody can discover what is the problem?

Jul 24 '24 22:07 haytham2597

For users to understand this PR

Automatic Mixed Precision package - torch.amp

https://pytorch.org/docs/stable/amp.html#autocasting

Jul 25 '24 06:07 GeorgeS2019

It looks like you're expecting the element type of 'b' to change after you exit the dynamic scope, is that right?

That would mean that you have to do the type conversion in place, at least from the perspective of the managed instance that 'b' refers to -- i.e. replace the handle to the native tensor rather than create a new managed instance. Is that what your code is doing?

Jul 25 '24 13:07 NiklasGustafsson

@NiklasGustafsson

Yes I trying change the dtype of B. But i think is not bad my code, because of Cuda OPS and the example of [§4]

A few hours ago I noticed that t in my code i keep certain IntPtr values that can change. In all instances I keep IntPtr but both outside and inside the scope they always differ.

[§1]

//From https://github.com/dotnet/TorchSharp/blob/b032342a78435ba6eb197e4e7db53469ac176aa8/src/TorchSharp/Tensor/Tensor.Math.cs#L1289
public Tensor mul(Scalar target)
{
    //For example Handle = 0x168
    var res = THSTensor_mul_scalar(Handle, target.Handle); //Now res is 0x196
    if (res == IntPtr.Zero) { CheckForErrors(); }
    return new Tensor(res);
}

[§2]

//From my src/TorchSharp/Amp/AMPManager.cs
private void Revert()
{
    for (int i = 0; i < TensorsCasts.Count; i++) {
        var tc = TensorsCasts[i];
        tc.Handle= To(tc.Handle, tc.Dtype); 
    }
}

Now like my last comment b=b.mul(1); the B is now completely new IntPtr, That is, I am saving the tensor "references" wrong. Because always change, so i never can revert in that way. I think.

In my code for holding IntPtr i do this: [§3]

//From src/TorchSharp/Tensor/Tensor.cs in internal Tensor(IntPtr handle)
if (AMPManager.GetInstance().IsEnabled) {
    this.handle = AMPManager.GetInstance().Work(handle, this.handle); //Can ignore second argument because i was testing other things
} else {
    this.handle = handle;
}

I'm getting dizzy but I think that in these code examples; §1, for example 0x168 is no longer available except 0x196.

Update: [§4]

a_float32 = torch.rand((8, 8), device="cuda")
b_float32 = torch.rand((8, 8), device="cuda")
a_float32_mul = torch.rand((8, 8), device="cuda")
print(f"Dtype of a_float32 Before autocast: {a_float32.dtype}")
print(f"Dtype of a_float32_mul Before autocast: {a_float32_mul.dtype}")
with torch.autocast(device_type="cuda"):
	e_float16= torch.mm(a_float32, b_float32)
	a_float32= torch.mm(a_float32, b_float32)
	a_float32_mul= a_float32_mul.mul(2) 
	print(f"Dtype of e_float16: {e_float16.dtype}")
	print(f"Dtype of a_float32: {a_float32.dtype}")
	print(f"Dtype of a_float32_mul: {a_float32_mul.dtype}")
	
print(f"Dtype of a_float32 OUTSCOPE: {a_float32.dtype}")
print(f"Dtype of a_float32_mul OUTSCOPE: {a_float32_mul.dtype}")

Dtype of a_float32 Before autocast: torch.float32 Dtype of a_float32 Before autocast: torch.float32 Dtype of e_float16: torch.float16 Dtype of a_float32: torch.float16 Dtype of a_float32_mul: torch.float32 Dtype of a_float32 OUTSCOPE: torch.float16 Dtype of a_float32_mul OUTSCOPE: torch.float32

Only certain operator (like torch.mm) keep same dtype. Mmm that mean my code is nothing wrong. I Should change dtype only for certain operator example torch.mm or another.

Glad to be closer to AMP and GradScaler.

Conclussion: I need read very well the documentation and testing well in python.

Jul 25 '24 14:07 haytham2597

b = b.mul(1);

What this statement does is overwrite the variable b with a completely new instance, both native and managed.

On the other hand:

b = b.mul_(1);

would do the multiplication in place, i.e. modify the existing instance:

public Tensor mul_(Tensor target)
{
        THSTensor_mul_(Handle, target.Handle);
        CheckForErrors();
        return this;
}

Jul 25 '24 14:07 NiklasGustafsson

@haytham2597:

This PR is still labeled 'Draft' -- how close do you think you're getting to having it ready to review and merge?

Oct 25 '24 17:10 NiklasGustafsson

This PR is still labeled 'Draft' -- how close do you think you're getting to having it ready to review and merge?

I am closest but not enough. I need write and Test the GradScaler And need find out how autocast the Module. Including i try use the BF16 of C10 LibTorch because some operator of CPU can pass as BFloat16 also GPU and how we know the netstandard do not have Half struct only Net 5 or newer, i added the Half Struct for Older than Net 5.

TODO:

[ ] C10::BFloat16 and Test
[ ] Finish and Test GradScaler
[x] Test Half Struct for older Net
[x] Autocast Cuda Ops
[ ] Autocast CPU Ops Bfloat16
[ ] Autocast Model, Sequential Module
[ ] Implement Test of TestGradScalingMultiple

Oct 25 '24 19:10 haytham2597

Any update?

Mar 27 '25 15:03 GeorgeS2019