AIF360 icon indicating copy to clipboard operation
AIF360 copied to clipboard

Download LawSchool dataset directly from SEAPHE

Open hoffmansc opened this issue 3 years ago • 1 comments

http://www.seaphe.org/databases.php

This way we can remove the dependency on tempeh. We can essentially copy this file (preserving the copyright notice): https://github.com/microsoft/tempeh/blob/main/tempeh/datasets/seaphe_datasets.py

See also meps_datasets.py for another example of downloading/unzipping.

Relevant files: tempeh_datasets.py law_school_gpa_dataset.py

See demo_grid_search_reduction_regression_sklearn.ipynb for example usage.

Behavior should be essentially the same as tempeh except dropping of NAs can be handled later so these should be kept.

hoffmansc avatar Aug 29 '22 14:08 hoffmansc

Possible Tasks:

  • [x] Ensure the license permits open source us
  • [x] Verify that this dataset is appropriate for fairness tasks and subset it accordingly (removing un-necessary columns etc.)
  • [x] Ensure we have instance level records with protected attributes and outcomes
  • [ ] First create sklearn-compatible dataset (dataframe) and an appropriate "classic" dataset (second priority)
  • [x] Create a simple notebook where the dataset is consumed and simple fairness measures and computed at least.
  • [ ] DO NOT download and incorporate the data, rather include a function that will do this since data is not hosted in AIF360.

anupamamurthi avatar Sep 15 '22 17:09 anupamamurthi

please assign me this issue.

EktaBhaskar avatar Sep 22 '23 19:09 EktaBhaskar

Can I get this issue assigned

vandanapathare avatar Sep 22 '23 21:09 vandanapathare