flink-ml
flink-ml copied to clipboard
[FLINK-31010] Add Transformer and Estimator for GBTClassifier and GBTRegressor
What is the purpose of the change
Add Transformer and Estimator for GBTClassifier and GBTRegressor.
Details about features compared to SparkML's implementation are as follows:
- Implemented in this PR: fundamental binary classification and regressor (only squared loss).
- Implemented and not supported in SparkML: 2nd-order approximation of loss func as impurity (this is an important feature supported by XGBoost and LightGBM [1]).
- Not implemented yet, but parameters added: early stopping with validation set, encoding with leaf id, and weight columns.
- Not implemented yet: classification threshold, absolute loss for regressor, feature importance, and 1st-order gradient.
- Not expected to be supported:
maxMemoryInMB,cacheNodeIds, andcheckpointInterval.
[1] https://xgboost.readthedocs.io/en/stable/tutorials/model.html#the-structure-score
Brief change log
- Add implementation of gradient-boosting trees.
- Add Transformer and Estimator for GBTClassifier and GBTRegressor.
Does this pull request potentially affect one of the following parts:
- Dependencies (does it add or upgrade a dependency): yes
- The public API, i.e., is any changed class annotated with
@Public(Evolving): no
Documentation
- Does this pull request introduce a new feature? yes
- If yes, how is the feature documented? JavaDocs
Hi, @lindong28 , thanks for your valuable comments. I've update the PR based on comments and offline discussions. Please take a look.