DecisionTreeDiscretiser to output integers in addition to the predictions
At the moment, the DecisionTreeDiscretiser returns the values of the tree predictions as the replacement of the original variables.
I would like to add the option to return integers from 1 to k (or 0 to k-1), where k is the number of final leaves. The numbers increase with the mean value target per leave.
Not sure how difficult it is to implement, I think we sort of need to navigate the tree somehow, pick up the final values at the leaves, and then create a mapping from final value to the integer. And add a parameter in the init where the user can specify if they want integers or predictions as output.
The addition is suggested after the Self-Guided via CART method available in MINITAB and described here.
Really nice one.
I was also using the DecisionTreeDiscretiser to binnarize a model probability prediction to sort of define rules of action depending on the generated bin.
In that case, it could be nice to have the option to return_boundaries within the list of parameters of DecisionTreeDiscretiser.

Hi @joaopcnogueira
Thanks for the detail and the notebook you contributed to the examples repo. And apologies for the delay. I was on holidays.
Would you like to give it a go at making the discretizer return the interval boundaries?
these links may help:
https://scikit-learn.org/stable/auto_examples/tree/plot_unveil_tree_structure.html#sphx-glr-auto-examples-tree-plot-unveil-tree-structure-py
https://mljar.com/blog/extract-rules-decision-tree/
https://stackoverflow.com/questions/20224526/how-to-extract-the-decision-rules-from-scikit-learn-decision-tree
No need to apologize, I completely understand that.
I will give it a try to make the discretizer return the interval boundaries, much likely as shown on the notebook on examples repo.
If DT has random seed locked, it will always return the same values, thus we can use something like stats.rankdata(x, method='dense') to transform floats to integers.
I created this question in stack overflow to see how to create boundaries from trees: https://stackoverflow.com/questions/75663472/how-to-obtain-the-interval-limits-from-a-decision-tree-with-scikit-learn