zingg icon indicating copy to clipboard operation
zingg copied to clipboard

Snowpark support

Open sonalgoyal opened this issue 3 years ago • 4 comments

It will be nice to have a native Snowflake version of Zingg which can run without Spark. While looking at snowpark, here are the first level thoughts

  • Need to figure out the Java/Scala integration as Snowpark is in Scala. should be doable but some changes may need conversions etc
  • Snowpark doesnt have Graphframes and MLLib equivalents
  • The Dataset API looks similar but there may be gaps
  • the pipe abstraction will no longer be needed, as we will assume data is in Snowflake
  • The interactive learner needs to be thought through. Where will its interface lie?
  • Spark Context etc classes need to change

Our approach could be

  • adapter pattern
  • parameterization of classes for both types
  • new code base(NO!!!)
  • some other

I will try and form an opinion on this after checking one flow (findtrainingData?) and see what needs to be done to make it Snowpark compatible. Will jot down findings here.

sonalgoyal avatar Feb 10 '22 09:02 sonalgoyal

pom - should we do shimming?

Client.java - JavaSparkContext.jarOfClass(IZinggFactory.class); we should be able to shift it elsewhere FieldDefinition.java - datatype is Spark based. Can be parameterized?? Pipe also refers to DataType, StructTypes. We have corresponding classes in Snowpark client Util.java can be cleansed

sonalgoyal avatar Feb 10 '22 12:02 sonalgoyal

Initial java and scala files to test basics of Snowpark MainTest.java.txt Main.scala.txt

navinrathore avatar Feb 11 '22 03:02 navinrathore

Hey @sonalgoyal @navinrathore, looking forward to using Zingg natively on top of Snowflake. Any timelines you have in mind for this feature?

nipunj15 avatar Jul 25 '22 12:07 nipunj15

that is great to hear @nipunj15. Sorry we do not have a timeline yet

sonalgoyal avatar Jul 25 '22 14:07 sonalgoyal

Snowflake is now natively supported in Zingg enterprise

sonalgoyal avatar Mar 07 '23 17:03 sonalgoyal