Dataset revision suggestions to diversify human images (currently 90+% white)

Open katherinerosewolf opened this issue 4 years ago • 0 comments

Datasets look good except in notebook 2, where the people in the human image training and validation datasets look maybe ~90%+ white (a classic problem with a lot of this research: https://www.aaihs.org/race-after-technology/).

I didn't feel quite comfortable enough with the subject matter to revise the code to incorporate new datasets, but this could be a less white option: https://paperswithcode.com/dataset/fairface

Since those are images of faces instead of full people, though, we could then compare it (possibly? not sure if the model would pick up white space differences more than faces) with this dog face dataset: https://github.com/GuillaumeMougeot/DogFaceNet

Might also be great to incorporate into the slide deck some of the recent literature on how machine learning algorithm output can reproduce or magnify racist trends in the feeder datasets.

Aug 26 '21 06:08 katherinerosewolf