pytocs icon indicating copy to clipboard operation
pytocs copied to clipboard

New frontier for PyToCS: .NET data science and machine learning

Open GeorgeS2019 opened this issue 4 years ago • 10 comments

@uxmal You have been doing this for close to 6 years. Now we need to challenge you for something you would not have conceived 6 years ago.

.NET is meeting python HALF WAY!

Instead of the usual Python to C#, Imagine that this task is made Simpler and quick to verify by successful compiling.

Recently, the Microsoft team decided to take a drastic decision to make .NET csharp/F# code to be as close as possible to python in the context of PyTorch to TorchSharp as shown in the attached image below.

Questions

When the codes of Python and .NET look almost similar, WHAT ADJUSTMENT and MODEIFICATIONS to PyToCs needed to make this conversion with high probability of success with minimum post-conversion manual editing?

Can you use the Tests you have created to share your suggestions?

The real world end to end use case is discussed here.

Imagine, the .NET interactive integrates both PyToCs and Roslyn, so when a python Jupyter notebook is opened within the .NET interactive, the PyTorch codes sections are extracted, converted to e.g. Csharp using PyToCs, verified the conversion by compiling internally using Roslyn. The failure of compiling will report which segments of the python codes fail to compile and still incompatible with TorchSharp. This report is critical to accelerate TorchSharp binding code coverage using real world scenario.

I hope it is clear. I hope this is an exciting exercise for the tool you have conceived 6 years ago and the .NET deep learning community need your contribution to extend your tool to a very interesting use case.

image

GeorgeS2019 avatar Nov 09 '21 17:11 GeorgeS2019

Hello, and thanks for your interest in pytocs! It's not quite clear to me what you're asking for but let me attempt to answer the questions you're asking.

When the codes of Python and .NET look almost similar, WHAT ADJUSTMENT and MODEIFICATIONS to PyToCs needed to make this conversion with high probability of success with minimum post-conversion manual editing?

The Python code fragment in the screen shot could almost be handled by pytocs in the state it is in now. The main stumbling blocks are:

  • type inference. This is still not working as well as I'd like, but "nicely" written Python code can often result in partially OK results. A possible approach is to change pytocs to handle Python type annotations; right now pytocs parses them, but doesn't do anything with them.
  • semantic differences in the Python and C# languages. The dynamic nature of Python is sometimes hard to replicate in C# automatically.
  • various bugs. These need to be identified and chased down. This leads me to your next question:

Can you use the Tests you have created to share your suggestions?

I'm not sure what you're asking here, but you are more than welcome to contribute with pull requests of Python code fragments and their expected translation to C#. You can look at the examples in: https://github.com/uxmal/pytocs/blob/master/src/Pytocs.Tests/ParserAcceptanceTests.cs I can then see what needs to be improved to make fully automatic translation work.

uxmal avatar Nov 09 '21 20:11 uxmal

@uxmal

As more users join to port PyTorch codes to the corresponding TorchSharp, we will have more converted TorchSharp codes to "train" the conversion of PyTorch using PyToCs.

Given your 6 years of experience learn from sharing this project, by just looking at the example provided, could you commend/suggest how best to make the PyToCs conversion "practical"?

Shall the community

  • Share FAQ on how to simplify the conversion?
  • Is there user defined Rules in PyToCs that users can customize to make the PyToCs codes more compatible with TorchSharp look?
  • Likewise, based on what PyToCs can achieve, how would you recommend TorshSharp developer to meet PytoCs output. I mean is there a need for TochSharp to be more flexible in accepting PyToCs generated csharp codes.

Questions

John, I hope you find these questions interesting. This scenario is not restricted to TorchSharp, there are many .NET community projects that are based on python codes. Java to Csharp is less challenging than python to Csharp.

Java to Csharp is more supported than python over last decades.

PERHAPS, now there are more .NET projects attempting to look "python" like, due to huge interest in data science and machine learning, do you see there is NEED to RETHINK PytoCs design? How would you do that if you were to start, where would you do it differently, MORE IMPORTANTLY, how would you recommend these .NET communities.

Python Source: TEXT CLASSIFICATION WITH THE TORCHTEXT LIBRARY

from torch import nn

class TextClassificationModel(nn.Module):

    def __init__(self, vocab_size, embed_dim, num_class):
        super(TextClassificationModel, self).__init__()
        self.embedding = nn.EmbeddingBag(vocab_size, embed_dim, sparse=True)
        self.fc = nn.Linear(embed_dim, num_class)
        self.init_weights()

    def init_weights(self):
        initrange = 0.5
        self.embedding.weight.data.uniform_(-initrange, initrange)
        self.fc.weight.data.uniform_(-initrange, initrange)
        self.fc.bias.data.zero_()

    def forward(self, text, offsets):
        embedded = self.embedding(text, offsets)
        return self.fc(embedded)

PyToCs conversion

using nn = torch.nn;

public static class PyTorch {
    
    public class TextClassificationModel
        : nn.Module {
        
        public object embedding;
        
        public object fc;
        
        public TextClassificationModel(object vocab_size, object embed_dim, object num_class) {
            this.embedding = nn.EmbeddingBag(vocab_size, embed_dim, sparse: true);
            this.fc = nn.Linear(embed_dim, num_class);
            this.init_weights();
        }
        
        public virtual object init_weights() {
            var initrange = 0.5;
            this.embedding.weight.data.uniform_(-initrange, initrange);
            this.fc.weight.data.uniform_(-initrange, initrange);
            this.fc.bias.data.zero_();
        }
        
        public virtual object forward(object text, object offsets) {
            var embedded = this.embedding(text, offsets);
            return this.fc(embedded);
        }
    }
}

Manual conversion


using static TorchSharp.torch;
using static TorchSharp.torch.nn;
using static TorchSharp.torch.nn.functional;

 class TextClassificationModel : Module
 {
     private Modules.EmbeddingBag embedding;
     private Modules.Linear fc;

     public TextClassificationModel(long vocab_size, long embed_dim, long num_class) : base("TextClassification")
     {
         embedding = EmbeddingBag(vocab_size, embed_dim, sparse: false);
         fc = Linear(embed_dim, num_class);
         InitWeights();

         RegisterComponents();
     }

     private void InitWeights()
     {
         var initrange = 0.5;

         init.uniform_(embedding.Weight, -initrange, initrange);
         init.uniform_(fc.Weight, -initrange, initrange);
         init.zeros_(fc.Bias);
     }

     public override Tensor forward(Tensor t)
     {
         throw new NotImplementedException();
     }

     public override Tensor forward(Tensor input, Tensor offsets)
     {
         using var t = embedding.forward(input, offsets);
         return fc.forward(t);
     }

     public new TextClassificationModel to(Device device)
     {
         base.to(device);
         return this;
     }
 }

GeorgeS2019 avatar Nov 09 '21 23:11 GeorgeS2019

I think the design of pytocs as it stands now is fine. It's a transpiler that converts Python source code to C# source code, trying to bridge the syntactic and semantic gap between the two languages. The biggest area for improvement is type inference support. It would be fantastic if pytocs could do a better job of inferring -- or using type hints -- to provide more accurate initial results. That's a question of people providing (small) samples of source code where they think pytocs could do a better job of inferring types, and fixing those. Naturally, contributions are welcome.

I think providing a 100% automatic translation of idiomatic Python source code is not possible. There are constructs in Python that just cannot be translated easily/automatically to C#, but require human intervention. I've already outlined in the pytocs documentation (https://github.com/uxmal/pytocs/blob/master/doc/HOWTO.md) a suitable git workflow that can track an active Python project and generate C#. I use this workflow in my personal projects and it works just fine.

uxmal avatar Nov 12 '21 12:11 uxmal

John, thanks again for taking time off to share your insight, which is valuable and not easy to gain by just looking through the codes.

Currently we are doing one-week long ML.NET hackathon. I will share your valuable insight to other participants when they attempt to port python code to .NET for e.g. TorchSharp or Tensorflow.NET. Thank you.

GeorgeS2019 avatar Nov 12 '21 12:11 GeorgeS2019

@uxmal a quick update. The decision to use pytorch-like syntax in TorchSharp has led to more community adoption. The TorchSharp community has grown significantly and the degree of PyTorch coverage is increasingly at steady speed.

GeorgeS2019 avatar Jul 11 '22 06:07 GeorgeS2019

There are different design concepts between pytorch and TorchSharp,

python code

self.conv1 = nn.Conv1d(in_channels=d_model, out_channels=d_ff, kernel_size=1, bias=False)

C# code

this.conv1 = nn.Conv1d(inputChannel: d_model, outputChannel: d_ff, kernelSize: 1, bias: false);

Parameter names are different .

Some methods exist in pytorch, but they do not exist in the document. TorchSharp does not support such methods.

look this https://github.com/dotnet/TorchSharp/issues/901

toolgood avatar Feb 11 '23 03:02 toolgood

@toolgood if you look into PyToCS, the parameter names could be replaced from the PyTorch version to the TorchSharp version. This will speed up beginner adopting to TorchSharp coming from pyTorch

GeorgeS2019 avatar Feb 11 '23 03:02 GeorgeS2019

@uxmal I have written part of the code to convert to TorchSharp, using text replacement and regular replacement.

toolgood avatar Feb 11 '23 03:02 toolgood

@uxmal Could u evaluate and then merge the PR submitted by @toolgood :-)

GeorgeS2019 avatar Feb 12 '23 01:02 GeorgeS2019

From @uxmal Nov 2021: You can look at the examples in: https://github.com/uxmal/pytocs/blob/master/src/Pytocs.Tests/ParserAcceptanceTests.cs I can then see what needs to be improved to make fully automatic translation work.

@toolgood I have not look into your PR yet, just curious if you took @uxmal into consideration. Perhaps @uxmal has additional suggestions?

GeorgeS2019 avatar Feb 12 '23 01:02 GeorgeS2019