silex
silex copied to clipboard
something to help you spark
The hosted mint statistics were appropriate when Silex was hosted at freevariable.com. This commit changes the template to use the same Google Analytics setup as radanalytics.io. Fixes #74, indirectly.
https://github.com/radanalyticsio/silex/blob/18413a8b537af254d19da48bd644df43277852f3/src/jekyll/_includes/themes/bootstrap/default.html#L20 http://freevariable.com/mint/?js has an https option that should be used
A lot of silex components have no actual dependency on Spark - ideally these could be published in a sub-package so that people can consume them without a Spark dependency
Issue from #56 Do you think it's worthwhile to apply this bias correction? https://en.wikipedia.org/wiki/Cram%C3%A9r's_V#Bias_correction They make it sound like a good idea but I don't have any experience with it.
Issue derived from #56 The wikipedia page on Cramer's V metions this: "The p-value for the significance of V is the same one that is calculated using the Pearson's chi-squared...
Implement Kendall's Tau, a measure of ordinal association. Ping @erikerlandson -- do you have an implementation sitting around you could easily make into a PR? :)
When I tried to use the `sbt` script to build `silex`, the script reported an error about retrieving `sbt-launch.jar`. I noticed that it tried to use the old `artifactoryonline.com` repo....
``` [info] SplitSampleSpec: [info] - should provide splitSample with integer argument [info] - should provide weightedSplitSample with weights argument *** FAILED *** [info] false was not true (split.scala:62) ``` https://travis-ci.org/willb/silex/jobs/119218111...
`IIDFeatureSamplingMethodsRDDSpec` produces warnings about containing large tasks. These should be squashed to increase readability of the tests by reducing the logging level.
[Cramer's V](https://en.wikipedia.org/wiki/Cram%C3%A9r%27s_V) is a measure of association between nominal (categorical) variables. Useful for feature selection, comparing clusterings, potentially evaluating splits in Decision Trees trained on purely categorical data, etc.