frameless Missing Dataset methods

Here is an exhaustive status of the API implemented by frameless.TypeDataset compared to Spark's Dataset. We are getting pretty close to 100% API coverage :smile:

Won't fix:

[ ] Dataset<T> alias(String alias) inherently unsafe
[ ] Dataset<Row> withColumnRenamed(String existingName, String newName) inherently unsafe
[ ] void createGlobalTempView(String viewName) inherently unsafe
[ ] void createOrReplaceTempView(String viewName) inherently unsafe
[ ] void createTempView(String viewName) inherently unsafe
[ ] void registerTempTable(String tableName) inherently unsafe
[ ] Dataset<T> where(String conditionExpr) use select instead

TODO:

[ ] <K> KeyValueGroupedDataset<K,T> groupByKey(scala.Function1<T,K> func, Encoder<K> evidence3)
[ ] DataFrameNaFunctions na()
[ ] DataFrameStatFunctions stat()
[ ] Dataset<T> dropDuplicates(String col1, String... cols)
[ ] Dataset<Row> describe(String... cols)
[ ] DataStreamWriter<T> writeStream() (see #232)
[ ] Dataset<T> withWatermark(String eventTime, String delayThreshold) (see #232)
[x] RelationalGroupedDataset cube(Column... cols) (WIP #246)
[x] RelationalGroupedDataset rollup(String col1, String... cols) (WIP #246)

Done:

[x] Dataset<T> sort(String sortCol, String... sortCols) (#248)
[x] Dataset<T> sortWithinPartitions(String sortCol, String... sortCols) (#248)
[x] Dataset<T> repartition(int numPartitions, Column... partitionExprs)
[x] Dataset<Row> drop(String... colNames) (#209)
[x] Dataset<Row> join(Dataset<?> right, Column joinExprs, String joinType)
[x] Dataset<scala.Tuple2<T,U>> joinWith(Dataset other, Column condition, String joinType)
[x] Dataset<Row> crossJoin(Dataset<?> right)
[x] Dataset<Row> agg(Column expr, Column... exprs)
[x] Column apply(String colName)
[x] Dataset as(Encoder evidence2)
[x] Dataset<T> cache()
[x] Dataset<T> coalesce(int numPartitions)
[x] Column col(String colName)
[x] Object collect()
[x] long count()
[x] Dataset<T> distinct()
[x] Dataset<T> except(Dataset<T> other)
[x] void explain(boolean extended)
[x] <A,B> Dataset<Row> explode(String inputColumn, String outputColumn, scala.Function1<A,TraversableOnce<B f)
[x] Dataset<T> filter(Column condition)
[x] Dataset<T> filter(scala.Function1<T,Object> func)
[x] T first() (as firstOption)
[x] Dataset flatMap(scala.Function1<T,TraversableOnce> func, Encoder evidence8)
[x] void foreach(ForeachFunction<T> func)
[x] void foreachPartition(scala.Function1<Iterator<T>,scala.runtime.BoxedUnit> f)
[x] RelationalGroupedDataset groupBy(String col1, String... cols)
[x] Dataset<T> intersect(Dataset<T> other)
[x] Dataset<T> limit(int n)
[x] Dataset map(scala.Function1<T,U> func, Encoder evidence6)
[x] Dataset mapPartitions(MapPartitionsFunction<T,U> f, Encoder encoder)
[x] Dataset<T> persist(StorageLevel newLevel)
[x] void printSchema()
[x] RDD<T> rdd()
[x] T reduce(scala.Function2<T,T,T> func) (as reduceOption)
[x] Dataset<T> repartition(int numPartitions)
[x] Dataset<T> sample(boolean withReplacement, double fraction, long seed)
[x] Dataset<Row> select(String col, String... cols)
[x] void show(int numRows, boolean truncate)
[x] Object take(int n)
[x] Dataset<Row> toDF()
[x] String toString()
[x] Dataset transform(scala.Function1<Dataset<T>,Dataset> t)
[x] Dataset<T> union(Dataset<T> other)
[x] Dataset<T> unpersist(boolean blocking)
[x] Dataset<Row> withColumn(String colName, Column col)
[x] Dataset<T> orderBy(String sortCol, String... sortCols)
[x] String[] columns()
[x] org.apache.spark.sql.execution.QueryExecution queryExecution()
[x] StructType schema()
[x] SparkSession sparkSession()
[x] SQLContext sqlContext()
[x] Dataset<T> checkpoint(boolean eager)
[x] String[] inputFiles()
[x] boolean isLocal()
[x] boolean isStreaming()
[x] Dataset<T>[] randomSplit(double[] weights, long seed)
[x] StorageLevel storageLevel()
[x] Dataset<String> toJSON()
[x] java.util.Iterator<T> toLocalIterator()
[x] DataFrameWriter<T> write()

Aug 08 '17 09:08 OlivierBlanvillain

<A,B> Dataset explode(String inputColumn, String outputColumn, scala.Function1<A,TraversableOnce<B f) is not working for Map type columns. While vanilla Spark supports it.

Jul 25 '18 12:07 snadorp

Yes, I was not able to fit Map because its type signature has two holes compared to one for all other. We can have an overloaded method just for Map I think.

Jul 25 '18 15:07 imarios

writeStream can be marked as done here.

Jul 14 '23 20:07 etspaceman