flink-ml icon indicating copy to clipboard operation
flink-ml copied to clipboard

[FLINK-31010] Add Transformer and Estimator for GBTClassifier and GBTRegressor

Open Fanoid opened this issue 2 years ago • 1 comments

What is the purpose of the change

Add Transformer and Estimator for GBTClassifier and GBTRegressor.

Details about features compared to SparkML's implementation are as follows:

  • Implemented in this PR: fundamental binary classification and regressor (only squared loss).
  • Implemented and not supported in SparkML: 2nd-order approximation of loss func as impurity (this is an important feature supported by XGBoost and LightGBM [1]).
  • Not implemented yet, but parameters added: early stopping with validation set, encoding with leaf id, and weight columns.
  • Not implemented yet: classification threshold, absolute loss for regressor, feature importance, and 1st-order gradient.
  • Not expected to be supported: maxMemoryInMB, cacheNodeIds, and checkpointInterval.

[1] https://xgboost.readthedocs.io/en/stable/tutorials/model.html#the-structure-score

Brief change log

  • Add implementation of gradient-boosting trees.
  • Add Transformer and Estimator for GBTClassifier and GBTRegressor.

Does this pull request potentially affect one of the following parts:

  • Dependencies (does it add or upgrade a dependency): yes
  • The public API, i.e., is any changed class annotated with @Public(Evolving): no

Documentation

  • Does this pull request introduce a new feature? yes
  • If yes, how is the feature documented? JavaDocs

Fanoid avatar Feb 10 '23 09:02 Fanoid

Hi, @lindong28 , thanks for your valuable comments. I've update the PR based on comments and offline discussions. Please take a look.

Fanoid avatar Mar 01 '23 08:03 Fanoid