pummeler issues

sorting into states directly is slow

3

Seems like maybe pandas/pytables append is a lot slower than writing into a new file. (Or else the rewriting-when-strings-are-longer code is hitting a lot.) The sort step should probably pre-count...

djsutherland

sort: means/stds, value counts ignore person weighting

This has the strange effect that eg the mean standardized `PINCP` across the US is `-0.16`. Probably not a huge deal, but still.

djsutherland

dask?

Seems like this might be a decent use-case for dask.

djsutherland

Featurization issues

1

- `MIGPUMA` has joint meaning with `MIGSP`; same for `POWPUMA`/`POWSP`. - Why does `RELP` come up so much in the ridge models? What does it mean in practice?

djsutherland

Efficient Bayesian ridge regression

2

Using 100 KDE features and all the categorical variables, I end up with a dataset that's `840x6578` so I'm inclined to do ridge regression. I tried to implement it in...

flaxter

Log transform for US$ variables

6

Here's the variables I think we should log transform, all representing income/wages/etc. VERSIONS = { ... 'log_transform_feats': '''INTP OIP PAP RETP SEMP SSIP SSP WAGP PERNP PINCP'''.split(), Only issue is...

flaxter

faster featurizer

1

The old Cython featurizer only took two minutes on low1 once dummies had been created; this new one takes two hours. Dunno how long dummies took, but not two hours....

djsutherland

enhancement

pummeler
pummeler copied to clipboard

Metadata

sorting into states directly is slow

sort: means/stds, value counts ignore person weighting

dask?

Featurization issues

Efficient Bayesian ridge regression

Log transform for US$ variables

faster featurizer

election data

analysis code

Featurizer not working (other general issues with the package)

← Metadata

Owner

Metadata

pummeler pummeler copied to clipboard

Metadata

← Metadata

Owner

Metadata

pummeler
pummeler copied to clipboard