SDV icon indicating copy to clipboard operation
SDV copied to clipboard

[CTGAN] Treat boolean columns similarly to categorical

Open npatki opened this issue 3 years ago • 0 comments

Problem Description

CTGAN is designed to one hot encode categorical values to ensure that the model learns to synthesize all possible categories. Without this special handling, the model may learn to only generate the most popular category.

While CTGAN properly handles this for categorical columns, the same should be done for boolean columns as well. Otherwise, it's been observed that the CTGAN may not learn to synthesize both possible boolean values. (See #580)

npatki avatar Jul 07 '22 21:07 npatki