AIF360
AIF360 copied to clipboard
Corrected german credit data
The widely used german credit data (that is already available in the toolkit) apparently has coding errors, so consider including https://archive.ics.uci.edu/ml/datasets/South+German+Credit+%28UPDATE%29
http://www1.beuth-hochschule.de/FB_II/reports/Report-2019-004.pdf
Tasks:
- [ ] Ensure the license permits open source use.
- [ ] Verify that this dataset is appropriate for fairness tasks and subset it accordingly (removing un-necessary columns etc.).
- [ ] Ensure we have instance level records with protected attributes and outcomes.
- [ ] First create sklearn-compatible dataset (dataframe) and an appropriate "classic" dataset (second priority).
- [ ] Create a simple notebook where the dataset is consumed and simple fairness measures and computed at least.
- [ ] DO NOT download and incorporate the data, rather include a function that will do this since data is not hosted in AIF360.
I was working on Colab and also ran into this error on the German Credit notebook, aif360 gave me instructions to download two files and move them to a folder. It was solved by running this code:
%pip install wget
import wget, os
output_directory = os.path.join("/usr/local/lib/python3.8/dist-packages/aif360/data/raw/german")
german_data_url = "https://archive.ics.uci.edu/ml/machine-learning-databases/statlog/german/german.data"
german_doc_url = "https://archive.ics.uci.edu/ml/machine-learning-databases/statlog/german/german.doc"
german_data = wget.download(german_data_url, out=output_directory)
german_doc = wget.download(german_doc_url, out=output_directory)