introduction_to_ml_with_python icon indicating copy to clipboard operation
introduction_to_ml_with_python copied to clipboard

Problem with Boston Housing Data

Open sdempwolf opened this issue 3 years ago • 5 comments

hello, On Sep 28 2022 I was working with the Boston Housing data and the exercises in module 02 supervised-learning. We received a message that there was an ethical problem with the Boston Housing data and that scikit-learn was recommending a switch to the California Housing data, for which they provided links. I ended up modifying the mglearn/datasets.py file, adding the import line and a function load_extended_california(). This allows the rest of the code in the notebook to function as written with the California housing data.

from sklearn.datasets import fetch_california_housing

def load_extended_california(): housing = fetch_california_housing() X = housing.data

X = MinMaxScaler().fit_transform(housing.data)
X = PolynomialFeatures(degree=2, include_bias=False).fit_transform(X)
return X, housing.target

sdempwolf avatar Sep 30 '22 14:09 sdempwolf

Hi! Yes, I was part of the discussion of making that change in sklearn. Since the book is using this dataset, the repo will continue to use that dataset. If I end up revising the book (somewhat unlikely at this point), I will replace the dataset.

amueller avatar Oct 18 '22 18:10 amueller

Hi! Yes, I was part of the discussion of making that change in sklearn. Since the book is using this dataset, the repo will continue to use that dataset. If I end up revising the book (somewhat unlikely at this point), I will replace the dataset.

Hi Andreas, I love using your book & notebooks in my classes. However, I don't want to have to revert to sklearn <1.2. I tried just replacing the references to Boston housing dataset with California housing data, but unsuccessful. Can you please point me to the files where this change needs to occur, as I must be missing one somehow. Or, will this approach just not work?

rsrenner avatar Mar 26 '23 23:03 rsrenner

Please update the mglearn library, that should solve the issue.

amueller avatar Jun 01 '23 17:06 amueller