Otto von Sperling comments

Results 16 comments of


                                            Otto von Sperling

AutoML() doesn't seem to use Ray's object store (for large datasets)

When I try to pass a Ray objectRef to AutoML's `fit`, I get an error that either a Numpy array, Pandas DataFrame or Scipy sparse matrix is expected.

AutoML() doesn't seem to use Ray's object store (for large datasets)

Thank you for the quick reply, @sonichi. Currently I do everything in Spark with Scala. I'm interested in using FLAML both because of the impressive CFO algorithm and also to...

AutoML() doesn't seem to use Ray's object store (for large datasets)

It looks very promising to integrate with Ray's object store. Thank you for the suggestion. I will run some experiments and post feedback in this thread for future reference.

AutoML() doesn't seem to use Ray's object store (for large datasets)

Perfect! I'm working with XGBoost, which is also built-in. Once I finish playing with this, I will share my `train_xgboost` function. Maybe we can create a section in the Docs...

Issue with DataSplitUtility repartition(0)

The relevant logs are: ``` java.lang.IllegalArgumentException: requirement failed: Number of partitions (0) must be positive. at scala.Predef$.require(Predef.scala:281) at org.apache.spark.sql.catalyst.plans.logical.Repartition.(basicLogicalOperators.scala:1372) at org.apache.spark.sql.Dataset.repartition(Dataset.scala:3022) at com.databricks.labs.automl.model.tools.split.SplitOperators$.optimizeTestTrain(SplitOperators.scala:371) at com.databricks.labs.automl.model.tools.split.DataSplitUtility.$anonfun$trainSplitPersist$1( DataSplitUtility.scala:108 ) at com.databricks.labs.automl.model.tools.split.DataSplitUtility.$anonfun$trainSplitPersist$1$adapted( DataSplitUtility.scala:85...

Issue with DataSplitUtility repartition(0)

~~Since yesterday, I tried using `FamilyRunner` and it works as long as I don't use "chronological" split method.~~ ~~The error I get with `FamilyRunner` is different from the above. In...

Issue with DataSplitUtility repartition(0)

The same error persists with FamilyRunner. I am investigating the problem.

Issue with DataSplitUtility repartition(0)

I believe I have found the problem. Due to large imbalance between classes in my label column, at some point Ksplit arguably creates an empty train/test set for the minority...

(Refactored) Guarantee that labels are aligned

I apologize for creating yet another PR. This will be the last one concerning this issue.

Bump from 2.9.1 causes queries to be canceled after 5 minutes

I see that extra parameters passed as `sfOptions` are provided to the JDBC driver. However, [the key is transformed to lower case](https://github.com/snowflakedb/spark-snowflake/blob/a8cb588f36fbf56701695a69546ed9817fa49c21/src/main/scala/net/snowflake/spark/snowflake/SnowflakeJDBCWrapper.scala#L196) which means that `queryTimeout` becomes `querytimeout`, which I...