scikit-learn icon indicating copy to clipboard operation
scikit-learn copied to clipboard

BUG: tree/forest regressor: impurity decrease calculation is wrong for criterion "friedman_mse"

Open cakedev0 opened this issue 2 months ago • 1 comments

Describe the bug

Well, everything is in the title.

I noticed that while writing the issue #32700

I'm opening this issue just for the records, as we plan to remove "friedman_mse" criterion anyway.

Steps/Code to Reproduce

import numpy as np
from sklearn.tree import DecisionTreeRegressor

X = np.arange(5).reshape(-1, 1)
y = [0, 1, 0, 1, 0]

reg = DecisionTreeRegressor(min_impurity_decrease=0.1, max_depth=1, criterion="friedman_mse")
reg.fit(X, y)

weighted_impurity = reg.tree_.impurity * reg.tree_.n_node_samples / X.shape[0]
actual_decrease = weighted_impurity[0] - weighted_impurity[1:3].sum()

assert actual_decrease >= 0.1

Or simply running the test sklearn/tree/tests/test_tree.py::test_min_impurity_decrease with criterion "friedman_mse"

Expected Results

No error

Actual Results

Assertion error

Versions

System:
    python: 3.12.11 (main, Aug 18 2025, 19:19:11) [Clang 20.1.4 ]
executable: /home/arthur/dev-perso/scikit-learn/sklearn-env/bin/python
   machine: Linux-6.14.0-35-generic-x86_64-with-glibc2.39

Python dependencies:
      sklearn: 1.8.dev0
          pip: None
   setuptools: 80.9.0
        numpy: 2.3.4
        scipy: 1.16.2
       Cython: 3.1.5
       pandas: 2.3.3
   matplotlib: 3.10.7
       joblib: 1.5.2
threadpoolctl: 3.6.0

Built with OpenMP: True

threadpoolctl info:
       user_api: blas
   internal_api: openblas
    num_threads: 16
         prefix: libscipy_openblas
       filepath: /home/arthur/dev-perso/scikit-learn/sklearn-env/lib/python3.12/site-packages/numpy.libs/libscipy_openblas64_-8fb3d286.so
        version: 0.3.30
threading_layer: pthreads
   architecture: Haswell

       user_api: blas
   internal_api: openblas
    num_threads: 16
         prefix: libscipy_openblas
       filepath: /home/arthur/dev-perso/scikit-learn/sklearn-env/lib/python3.12/site-packages/scipy.libs/libscipy_openblas-b75cc656.so
        version: 0.3.29.dev
threading_layer: pthreads
   architecture: Haswell

       user_api: openmp
   internal_api: openmp
    num_threads: 16
         prefix: libgomp
       filepath: /usr/lib/x86_64-linux-gnu/libgomp.so.1.0.0
        version: None

cakedev0 avatar Nov 13 '25 16:11 cakedev0

Follow-up from this issue: the test test_min_impurity_decrease should test all criteria. My PR https://github.com/scikit-learn/scikit-learn/pull/32699 does that.

cakedev0 avatar Nov 13 '25 16:11 cakedev0