machinelearning icon indicating copy to clipboard operation
machinelearning copied to clipboard

SchemaDefinition.Create picks up internal fields instead of just public only - affects F#

Open fwaris opened this issue 3 years ago • 9 comments

This used to work before but now I cannot use CreateFromEnumerable<T> in F# now.

In F#, we define mutable classes by annotating F# 'record' types with CLIMutable:

[<CLIMutable>]
type D = 
   { 
         Data :  float[]
   }

F# compiler generates IL that looks as follows:

public class D  [ interface list  ] 
{
        internal Data@ = Double[];

  	public Double[] Data
	{
		get
		{
			return Data@;
		}
		set
		{
			Data@ = value;
		}
	}

     [ more methods including a default constructor ]
}

The SchemaDefinition.Create method picks up both "Data@" and "Data" as fields required by the schema. It should only pickup the public fields.

Please add unit tests for F# as well so these issues are caught earlier.

@dsyme

fwaris avatar May 30 '22 13:05 fwaris

I just found out that one workaround for the above issue is to set "ignoreMissingColumns=true" when calling CreateEnumerable(...)

fwaris avatar May 30 '22 15:05 fwaris

Glad you found a workaround, but yeah that isn't the behavior we want I don't think.

When did you notice this change? We haven't made a recent changes to SchemaDefinition.Create that I am aware of. I wonder if it has to do with a .NET version change? What version are you using now? Could you also try an older version?

michaelgsharp avatar Jun 13 '22 17:06 michaelgsharp

I think the change was F# tooling rather than ML.Net (rethinking the issue). Some newer version of F# (maybe 5.x) changed the IL generation for mutable records, surfacing this issue.

Totally agree that only public fields should be considered for serialization/deserialization.

Note also that, as-is, when creating a DataView (LoadFromEnumerable…) , the xxx@ fields also get added into the dataset, bloating its size.

Sent from Mailhttps://go.microsoft.com/fwlink/?LinkId=550986 for Windows

From: Michael @.> Sent: Monday, June 13, 2022 1:51 PM To: @.> Cc: @.>; @.> Subject: Re: [dotnet/machinelearning] SchemaDefinition.Create picks up internal fields instead of just public only - affects F# (Issue #6209)

Glad you found a workaround, but yeah that isn't the behavior we want I don't think.

When did you notice this change? We haven't made a recent changes to SchemaDefinition.Create that I am aware of. I wonder if it has to do with a .NET version change? What version are you using now? Could you also try an older version?

— Reply to this email directly, view it on GitHubhttps://github.com/dotnet/machinelearning/issues/6209#issuecomment-1154210278, or unsubscribehttps://github.com/notifications/unsubscribe-auth/AACGZMFVW53NGRAHXS6NG2TVO5YKFANCNFSM5XKT6S2Q. You are receiving this because you authored the thread.Message ID: @.***>

fwaris avatar Jun 14 '22 16:06 fwaris

Since this is a change by the F# tooling there is nothing that we can do about it unfortunately. Closing this for now, but if things change in the future please create a new issue and we will take another look.

michaelgsharp avatar Jul 11 '22 17:07 michaelgsharp

@michaelgsharp Just to check....

There are some situations where F# needs to emit public fields - notably when emitting code into multiple assemblies in F# Interactive.

My understanding is that serialization of fields should ignore

  • internal fields
  • private fields
  • public fields marked "CompilerGenerated".

This is what System.Json and Newtonsoft.Json both do. Does SchemaDefinition do that? It would be great if it does.

See also https://github.com/dotnet/fsharp/pull/13494 which is related

@fwaris Do you know if you hit this issue when using F# Interactive?

Thanks Don

dsyme avatar Jul 11 '22 23:07 dsyme

@dsyme let me reopen the issue for now while I look into that. I am not actually sure off of the top of my head.

michaelgsharp avatar Jul 13 '22 17:07 michaelgsharp