Emily May Curtin
Emily May Curtin
Github automatically detected and alerted about security vulnerabilities in the Gemfile for the docs site. This should be easy enough to update.
This applies to all output formats, but just as an example it would be nice to be able to write compressed csv from Spark-Bench. To do this in code is...
Because data in Spark is lazily evaluated, it makes no distinction between loading and transforming data. The results field name should be changed accordingly so as not to mislead users.
Many ML workloads such as LogisticRegression generate and require as input datasets of the form RDD[LabeledPoint]. Converting back and forth from a weakly typed dataframe to an RDD of LabeledPoint...
Instead of just taking in the path for one dataset, the SQL workload should take in a map of table name -> location and a query string with table names...
This issue was mistakenly referencing Logistic Regression. The Logistic Regression workload has been implemented whereas its accompanying workload has not. However, the Linear Regression data generator has been implemented whereas...
The legacy label propagation workload appears to generate a graph inside the workload. To keep with the standards established in the new version, the data generation should be extracted. There...
Port all workloads available in legacy version to new version.
The custom workload documentation currently assumes local, or at least a mode where a local file is accessible. Beef up the docs to cover the yarn, standalone, and local use...