Emily May Curtin issues

Results 35 issues of


                                            Emily May Curtin

Docs: Address security alerts related to Gemfile dependencies

Github automatically detected and alerted about security vulnerabilities in the Gemfile for the docs site. This should be easy enough to update.

Add write options for formats

This applies to all output formats, but just as an example it would be nice to be able to write compressed csv from Spark-Bench. To do this in code is...

Type: New Feature

Difficulty: Medium

Change the name of load_time field in results to load_and_transform_time

Because data in Spark is lazily evaluated, it makes no distinction between loading and transforming data. The results field name should be changed accordingly so as not to mislead users.

Type: Maintenance

Difficulty: Easy

help wanted

Implement I/O for datasets of LabeledPoints

Many ML workloads such as LogisticRegression generate and require as input datasets of the form RDD[LabeledPoint]. Converting back and forth from a weakly typed dataframe to an RDD of LabeledPoint...

Type: New Feature

Difficulty: Medium

help wanted

The SQL workload needs an option for partitioning workload output

Type: Maintenance

Difficulty: Easy

Update SQL Workload to take in map of tables and table names

Instead of just taking in the path for one dataset, the SQL workload should take in a map of table name -> location and a query string with table names...

Type: New Feature

Difficulty: Easy

help wanted

Port Linear Regression workload from legacy

This issue was mistakenly referencing Logistic Regression. The Logistic Regression workload has been implemented whereas its accompanying workload has not. However, the Linear Regression data generator has been implemented whereas...

Type: New Feature

Difficulty: Medium

Type: Docs

help wanted

Emily May Curtin

Docs: Address security alerts related to Gemfile dependencies

Add write options for formats

Change the name of load_time field in results to load_and_transform_time

Implement I/O for datasets of LabeledPoints

The SQL workload needs an option for partitioning workload output

Update SQL Workload to take in map of tables and table names

Port Linear Regression workload from legacy

Investigate and Port Label Propagation workload from legacy

EPIC: Legacy Workload Ports

Document custom workload usage for Yarn, Standalone, Local