xgboost icon indicating copy to clipboard operation
xgboost copied to clipboard

[Roadmap] Multiple outputs.

Open trivialfis opened this issue 3 years ago • 18 comments

Since XGBoost 1.6, we have been working on having multi-output support for the tree model. In 2.0, we will have the initial implementation for the vector-leaf-based multi-output model. This issue serves as a tracker for future development and related discussion. The original feature request is here: https://github.com/dmlc/xgboost/issues/2087 . The related features are for vector-leaf rather than general multi-output.

Feel free to share your suggestions or make related feature requests in the comments.

Implementation Optimization

  • [ ] Use f-order for the gradient. Currently, the gradient has one column for each target but is written in C-order. The transformation takes about one-fifth of the training time. (#9508)
  • [x] Use f-order for the custom objective. (#9089)
  • [x] Improve array type dispatching by moving the dispatch logic from per-element to per-array. This enables us to have a more efficient custom objective interface. (#9090)

Algorithmic Optimization

We are still looking for potential algorithmic optimization for vector-leaf and here's the pool of candidates. We need to survey all available options. Feel free to share if you have ideas or paper recommendations.

  • [ ] Sketch boost.
  • [ ] https://arxiv.org/abs/2201.06239
  • [ ] Extra tree.

(#11798)

GPU Implementation

  • [ ] Evaluation (#11781)
  • [ ] Histogram (#11781)
  • [x] Prediction (#11752)
  • [ ] Prediction cache.
  • [ ] Model (#11277)
  • [ ] Partition. (#11789)
  • [ ] Gradient sampling.

Documentation

  • [ ] Derive the approximated Hessian in the context of boosting trees.

Multi-task

  • [ ] Multi-task xgboost. This is not yet decided. I think it's wise to at least do some exploration before forging the rest of the implementation since we will have a very different interface if we need to consider multi-task. Related: https://github.com/dmlc/xgboost/issues/7693 .

Features

  • [ ] Tree SHAP
  • [x] Plotting (https://github.com/dmlc/xgboost/pull/10093)
  • [x] Model text dump (JSON, txt, graphviz) (#10093, #11747)
  • [ ] Tree data frame.
  • [ ] Categorical feature.
  • [ ] Constraints
  • [ ] Approx tree method
  • [ ] Exact tree method
  • [ ] Loss weight
  • [ ] Feature importance (be careful with tree index) (https://github.com/dmlc/xgboost/pull/10700)
  • [x] Intercept. (#11656)

Learning to rank

We can have a ranking model to consider multiple criteria. This might require multi-task to be supported.

Quantile regression

  • [ ] l1
  • [ ] quantile

Distributed

  • [ ] Dask
  • [ ] PySpark
  • [ ] Spark
  • [ ] Flink?
  • [ ] Federated (https://github.com/dmlc/xgboost/pull/9171)

Binding

  • [ ] R (https://github.com/dmlc/xgboost/pull/9526)
  • [ ] Scala
  • [x] Python
  • [ ] Java
  • [ ] C

HPO

  • [ ] Check compatibility with major HPO frameworks.

Other extensions

  • [ ] Sparse label. (multi-label classification optimization)
  • [ ] Missing label.
  • [ ] Early stopping for each target?

Applications

  • https://arxiv.org/abs/2210.06831
  • [ ] FIL

Benchmarks

  • [ ] Collection of datasets for future comparison.

trivialfis avatar Apr 17 '23 15:04 trivialfis

Hi, great work on the initial multitarget implementation!

Given the roadmap when can we expect GPU support for multi output regression? When this support is added will xgboost-ray also support it?

CarloLepelaars avatar Oct 30 '23 21:10 CarloLepelaars

Hi @CarloLepelaars ,

  • For model-per-target, it's already implemented.
  • For vector leaf, it will take some more work, but eventually yes. I don't have an eta for when it will be available.
  • I think there is on-going work on the ray-xgboost, but you will need to open an issue on that repository for concrete answers.

trivialfis avatar Oct 31 '23 06:10 trivialfis

Hi, very nice work! I am wondering how SHAP should be used for multi-output models, e.g. how to explain links between the Ys, and how to interpret the effects of Xs - e.g., which Xs display common effects across the Ys, and which Xs display differential effects. Do you know a good example of using SHAP for a multi-output model?

wiktorolszowy avatar Nov 23 '23 23:11 wiktorolszowy

For model per target, it's the same as single target. As for vector leaf, I haven't looked into it yet, but no significant difference on top of my mind.

trivialfis avatar Nov 27 '23 23:11 trivialfis

I am currently toying with multitargets approach ... I have a hard time defining a custom metric (haven't tried custom loss). Preds seems to be of size (len(y) x len(targets)) while y_true is of shape (len(y), len(targets)), I have managed to handle this internally to my metric to return one value. But now I have an error about an output being a tuple instead of a number. Any way to handle this properly or is it too early ?

lcrmorin avatar Dec 25 '23 11:12 lcrmorin

Hi.

Did anybody train the multiple outputs XGBoost model on Mac arm64 machine?

On recent stable version I have got error: XGBoostError('[...] Check failed: !trees.front()->IsMultiTarget(): Update tree leaf support for multi-target tree is not yet implemented.

On latest nightly version xgboost-2.1.0.dev0+a7226c02223246be78a59c3a4e8c32d1c68c1ff9 - I have managed load CPU, but it was no feedback on terminal window.

Reederey87 avatar Dec 28 '23 19:12 Reederey87

Is the vector-leaf-based multi-output model still work in progress ? Also what research paper based on which splitting mechanism for decision trees is working for this ? @trivialfis

aniruddhghatpande avatar Mar 20 '24 16:03 aniruddhghatpande

yes, it's still working in progress.

trivialfis avatar Mar 25 '24 08:03 trivialfis

Hi @trivialfis,

I'm currently working on some models using XGBoostLSS which as far as I understand is based on the multi-output feature of XGBoost. I wonder how monotonic constraints are considered in the multi-ouput case ? It seems constraints are shared among trees built for each target, could you confirm ?

Thanks for your work on this feature !

mxdub avatar Mar 25 '24 09:03 mxdub