Mobius icon indicating copy to clipboard operation
Mobius copied to clipboard

Serialization issues with Mono 5

Open purefunkce opened this issue 7 years ago • 1 comments

I get a serialization error any time I'm passing data at runtime. This happens with any calls using a lambda expression with data originating outside of the lambda expression. Using broadcast variables also gives the same error.

Gives serialization error: string x="/path"; var results = rdd.Map(input => { Console.WriteLine(x); });

but, no serialization error here: var results = rdd.Map(input => { Console.WriteLine("/path"); });

This is running 2.0.2 Spark, Linux Mono 5.10.1.20, built with msbuild. I've also tested Mono4.8.1 with xbuild and get the same error.

actual error:

ERROR System.Runtime.Serialization.SerializationException: Type '(MyClassName+<>c__DisplayClass4_0' in Assembly 'MyAssembly, Version=1.0.0.0, Culture=neutral, PublicKeyToken=null' is not marked as serializable. at System.Runtime.Serialization.FormatterServices.InternalGetSerializableMembers (System.RuntimeType type) [0x00045] in <8fbafb724c144c9dad69bccfec38ae40>:0 at System.Runtime.Serialization.FormatterServices+<>c__DisplayClass9_0.<GetSerializableMembers>b__0 (System.Runtime.Serialization.MemberHolder _) [0x00000] in <8fbafb724c144c9dad69bccfec38ae40>:0 at System.Collections.Concurrent.ConcurrentDictionary2[TKey,TValue].GetOrAdd (TKey key, System.Func2[T,TResult] valueFactory) [0x00034] in <8fbafb724c144c9dad69bccfec38ae40>:0 at System.Runtime.Serialization.FormatterServices.GetSerializableMembers (System.Type type, System.Runtime.Serialization.StreamingContext context) [0x0005e] in <8fbafb724c144c9dad69bccfec38ae40>:0

purefunkce avatar Apr 03 '18 06:04 purefunkce

Ok, so for anyone else out there thinking their pyspark style lambda expressions are going to work you are going to have a bad day. There is a fundamental issue, namely the C# compiler generates anonymous functions as non-serializable. SparkCLR does some work to serialize fully closed lambda expressions, but any lambda's that reference outside their scope will be picked up by the C# compiler and made non-serializable.

I'm sure there are ways around this to elegantly turn any anonymous function into a method on a serializable class, but it has thus far eluded me. The bottom line is that you need to make a serializable helper class and use a method from that class directly in map, etc. without any lambda expressions that are not fully closed.

From the sharp CLR samples there is a BroadcastHelper class that can help you transfer data as a broadcast variable, and you can use this architecture to send any sort of data to the worker threads by first initializing a new object with the data you want used in the delegate: [Serializable] internal class BroadcastHelper<T,U> { private readonly Broadcast<T> broadcastVar; internal BroadcastHelper(Broadcast<T> broadcastVar) { this.broadcastVar = broadcastVar; } internal T Execute(U i) { return broadcastVar.Value; } }

now, this can work: string parallel = "test serialization string"; var broadcast = sc.Broadcast(parallel); Console.WriteLine(test.Map(new BroadcastHelper<string,int>(broadcast).Execute).First());

whereas string parallel = "test serialization string"; var broadcast = sc.Broadcast(parallel); Console.WriteLine(test.Map(x=>broadcast.Value).First()); == bad day

I would love to chat with anyone out there who has the experience to make these kind of lambda expressions work out of the box without making helper classes.

purefunkce avatar Apr 12 '18 06:04 purefunkce