Download LawSchool dataset directly from SEAPHE
http://www.seaphe.org/databases.php
This way we can remove the dependency on tempeh. We can essentially copy this file (preserving the copyright notice): https://github.com/microsoft/tempeh/blob/main/tempeh/datasets/seaphe_datasets.py
See also meps_datasets.py for another example of downloading/unzipping.
Relevant files: tempeh_datasets.py law_school_gpa_dataset.py
See demo_grid_search_reduction_regression_sklearn.ipynb for example usage.
Behavior should be essentially the same as tempeh except dropping of NAs can be handled later so these should be kept.
Possible Tasks:
- [x] Ensure the license permits open source us
- [x] Verify that this dataset is appropriate for fairness tasks and subset it accordingly (removing un-necessary columns etc.)
- [x] Ensure we have instance level records with protected attributes and outcomes
- [ ] First create sklearn-compatible dataset (dataframe) and an appropriate "classic" dataset (second priority)
- [x] Create a simple notebook where the dataset is consumed and simple fairness measures and computed at least.
- [ ] DO NOT download and incorporate the data, rather include a function that will do this since data is not hosted in AIF360.
please assign me this issue.
Can I get this issue assigned