hivemall icon indicating copy to clipboard operation
hivemall copied to clipboard

Scalable machine learning library for Apache Hive/Spark/Pig

Results 31 hivemall issues
Sort by recently updated
recently updated
newest added

``` sql create table hyperparams as WITH dual as ( select 1 ) select gridsearch(array('linear','kernel'), array('lambda 1', 'lambda 2')) as params from dual ; create table hyperparams as select '-linear...

enhancement
call-for-contribution

http://web.stanford.edu/~boyd/papers/pdf/admm_distr_stats.pdf https://github.com/intentmedia/admm http://imi.kyushu-u.ac.jp/~waki/ws2013/slide/suzuki.pdf

enhancement
call-for-contribution

This PR enables compile-time coding style check by using maven-checkstyle-plugin. Please run: ``` $ mvn clean compile ``` and it shows you the warning and error messages, based on Google...

discussions
pullrequest
under-review

https://www.jair.org/media/953/live-953-2037-jair.pdf

enhancement
call-for-contribution
high-priority

https://en.wikipedia.org/wiki/Lift_(data_mining)

enhancement
call-for-contribution

``` sql WITH fv as ( select itemid, collect_list(other) as features, -- array collect_list(cnt) as weight -- array from cooccurrence group by itemid ) select itemid, feature_vector(features, weight) as fv...

enhancement
call-for-contribution

Since there is currently no xgboost library in the maven central repo., we have bundled it in the hivemall package (https://github.com/myui/hivemall/tree/master/xgboost/lib). This is a workaround (I followed [this direction](https://eureka.ykyuen.info/2014/06/10/maven-include-system-scope-dependency-in-maven-assembly-plugin/)) and...

enhancement

# Feature selection Feature selection is the process of selecting a subset consisting of influential features from multiple features. It is an important technique to **enhance results**, **shorten training time**...

enhancement
discussions
WIP

Related to #345, Hive UDF invocation is slow in Spark. We can do better at least for UDF, currently not for UDAF/UDTF, by implementing Spark's Java [UDF{1,...,22}](https://github.com/apache/spark/tree/master/sql/core/src/main/java/org/apache/spark/sql/api/java) as well as...

enhancement
discussions
call-for-contribution

A parameter server is a framework to asynchronously share parameters among machine learning workers for higher scalability. `Hivemall` currently has a standalone server implementation, named a [MIX server](https://github.com/myui/hivemall/tree/master/mixserv), to asynchronously...

enhancement
discussions