pummeler
pummeler copied to clipboard
Featurization issues
-
MIGPUMAhas joint meaning withMIGSP; same forPOWPUMA/POWSP. - Why does
RELPcome up so much in the ridge models? What does it mean in practice?
- CITWP, YOEP, JWMNP: mean-coding blanks might not be the right thing, since blank means the person was born in the US / doesn't work
- MLP* (when served in military) could probably be simplified; VPS does that
- NWAB, NWAV, NWLA, NWLK, NWRE are recoded into ESR
- RELP (relationship to reference person) is kind of a weird feature
- hierarchical featurization for ANC_P / FOD_P / INDP / NAICSP / OCCP / SOCP?
- for NAICS/SOC see https://www.census.gov/people/io/methodology/indexes.html
- merge ANC1P/ANC2P, RAC1P/RAC2P/...?
- re-featurize JWAP/JWDP to be circular?
- Things that refer to specific in-US places: MIGPUMA, MIGSP, POBP, POWPUMA, POWSP
- POVPIP: pretty sharp featurization difference between 500 and 501