Emily May Curtin

Results 35 issues of Emily May Curtin

Github automatically detected and alerted about security vulnerabilities in the Gemfile for the docs site. This should be easy enough to update.

This applies to all output formats, but just as an example it would be nice to be able to write compressed csv from Spark-Bench. To do this in code is...

Type: New Feature
Difficulty: Medium

Because data in Spark is lazily evaluated, it makes no distinction between loading and transforming data. The results field name should be changed accordingly so as not to mislead users.

Type: Maintenance
Difficulty: Easy
help wanted

Many ML workloads such as LogisticRegression generate and require as input datasets of the form RDD[LabeledPoint]. Converting back and forth from a weakly typed dataframe to an RDD of LabeledPoint...

Type: New Feature
Difficulty: Medium
help wanted

Instead of just taking in the path for one dataset, the SQL workload should take in a map of table name -> location and a query string with table names...

Type: New Feature
Difficulty: Easy
help wanted

This issue was mistakenly referencing Logistic Regression. The Logistic Regression workload has been implemented whereas its accompanying workload has not. However, the Linear Regression data generator has been implemented whereas...

Type: New Feature
Difficulty: Medium

The legacy label propagation workload appears to generate a graph inside the workload. To keep with the standards established in the new version, the data generation should be extracted. There...

Type: New Feature

Port all workloads available in legacy version to new version.

Type: Zenhub Epic

The custom workload documentation currently assumes local, or at least a mode where a local file is accessible. Beef up the docs to cover the yarn, standalone, and local use...

Difficulty: Medium
Type: Docs
help wanted