AIF360 icon indicating copy to clipboard operation
AIF360 copied to clipboard

Corrected german credit data

Open nrkarthikeyan opened this issue 3 years ago • 3 comments

The widely used german credit data (that is already available in the toolkit) apparently has coding errors, so consider including https://archive.ics.uci.edu/ml/datasets/South+German+Credit+%28UPDATE%29

http://www1.beuth-hochschule.de/FB_II/reports/Report-2019-004.pdf

nrkarthikeyan avatar Aug 11 '22 13:08 nrkarthikeyan

Tasks:

  • [ ] Ensure the license permits open source use.
  • [ ] Verify that this dataset is appropriate for fairness tasks and subset it accordingly (removing un-necessary columns etc.).
  • [ ] Ensure we have instance level records with protected attributes and outcomes.
  • [ ] First create sklearn-compatible dataset (dataframe) and an appropriate "classic" dataset (second priority).
  • [ ] Create a simple notebook where the dataset is consumed and simple fairness measures and computed at least.
  • [ ] DO NOT download and incorporate the data, rather include a function that will do this since data is not hosted in AIF360.

nrkarthikeyan avatar Sep 15 '22 20:09 nrkarthikeyan

I was working on Colab and also ran into this error on the German Credit notebook, aif360 gave me instructions to download two files and move them to a folder. It was solved by running this code:

%pip install wget
import wget, os

output_directory = os.path.join("/usr/local/lib/python3.8/dist-packages/aif360/data/raw/german")

german_data_url = "https://archive.ics.uci.edu/ml/machine-learning-databases/statlog/german/german.data"
german_doc_url = "https://archive.ics.uci.edu/ml/machine-learning-databases/statlog/german/german.doc"

german_data = wget.download(german_data_url, out=output_directory)
german_doc = wget.download(german_doc_url, out=output_directory)

Ricardo-OB avatar Jan 22 '23 04:01 Ricardo-OB