silex icon indicating copy to clipboard operation
silex copied to clipboard

something to help you spark

Results 15 silex issues
Sort by recently updated
recently updated
newest added

The hosted mint statistics were appropriate when Silex was hosted at freevariable.com. This commit changes the template to use the same Google Analytics setup as radanalytics.io. Fixes #74, indirectly.

https://github.com/radanalyticsio/silex/blob/18413a8b537af254d19da48bd644df43277852f3/src/jekyll/_includes/themes/bootstrap/default.html#L20 http://freevariable.com/mint/?js has an https option that should be used

A lot of silex components have no actual dependency on Spark - ideally these could be published in a sub-package so that people can consume them without a Spark dependency

Issue from #56 Do you think it's worthwhile to apply this bias correction? https://en.wikipedia.org/wiki/Cram%C3%A9r's_V#Bias_correction They make it sound like a good idea but I don't have any experience with it.

Issue derived from #56 The wikipedia page on Cramer's V metions this: "The p-value for the significance of V is the same one that is calculated using the Pearson's chi-squared...

Implement Kendall's Tau, a measure of ordinal association. Ping @erikerlandson -- do you have an implementation sitting around you could easily make into a PR? :)

When I tried to use the `sbt` script to build `silex`, the script reported an error about retrieving `sbt-launch.jar`. I noticed that it tried to use the old `artifactoryonline.com` repo....

``` [info] SplitSampleSpec: [info] - should provide splitSample with integer argument [info] - should provide weightedSplitSample with weights argument *** FAILED *** [info] false was not true (split.scala:62) ``` https://travis-ci.org/willb/silex/jobs/119218111...

`IIDFeatureSamplingMethodsRDDSpec` produces warnings about containing large tasks. These should be squashed to increase readability of the tests by reducing the logging level.

[Cramer's V](https://en.wikipedia.org/wiki/Cram%C3%A9r%27s_V) is a measure of association between nominal (categorical) variables. Useful for feature selection, comparing clusterings, potentially evaluating splits in Decision Trees trained on purely categorical data, etc.