full scale test
Should probably synthesize the population of the Bay Area and solve any issues that come up. If it's fast enough we should go for the whole county (why not?).
@jiffyclub I gave this a shot. There was at least one block group that needed 15K iterations in the ipu. When I upped it to 20K iterations the Bay Area completed successfully. Right now it's running in about 40 minutes. Seems like checking the results is the next order of business.
Nice! A couple thoughts:
- I wonder if the convergence criterial in the IPU could be loosened a bit without affecting the final results.
- I wonder if addressing the zero-cell thing would make it easier to reach convergence in the IPU.
- If we want it to be even faster we can experiment with numba and cython.
As we work on the validation we'll want to track everything we do so it can be publicized. I dunno if maybe a separate repo would be good for that, or if we keep it in this one somewhere.
I wonder these things too - we can definitely try it and see.
I agree on publicized validation - I vote for keeping it in this repo - maybe with a notebook (or more than one) that's well annotated, I would guess in a separate directory.
Nice progress. 40 minutes includes the sampling of household and person records and writing the resulting synthetic population out? or just through the IPU step?
I also like the publicized validation approach, and keeping that on the same repo sounds good.
On Sun, Sep 7, 2014 at 4:27 PM, Fletcher Foti [email protected] wrote:
I wonder these things too - we can definitely try it and see.
I agree on publicized validation - I vote for keeping it in this repo - maybe with a notebook (or more than one) that's well annotated, I would guess in a separate directory.
— Reply to this email directly or view it on GitHub https://github.com/synthicity/synthpop/issues/19#issuecomment-54764837.
Hi -- I'm trying to use SynthPop as part of a research project and am encountering runtime issues. I'm applying the synthesizer for Mecklenburg County, NC and am getting the following runtime for a single block. Any suggestions?
I was super encouraged to see that @waddell was able to do the full bay area in 40 minutes.
Time to run ipu: 390.129s IPU weights: count 3.687000e+03 mean 1.933344e-01 std 4.484030e-01 min 3.711018e-11 25% 4.032434e-06 50% 7.556055e-05 75% 1.988441e-01 max 7.685979e+00 dtype: float64 Fit quality: 4.872272957062106 Number of iterations: 234 Drawing 620 households
The following was achieved by using:
from synthpop.recipes.starter2 import Starter
from synthpop.synthesizer import synthesize_all, enable_logging
import os
import pandas as pd
enable_logging()
# setting API Key
os.environ["CENSUS"] = "d95e144b39e17f929287714b0b8ba9768cecdc9f"
starter = Starter(os.environ["CENSUS"], "NC", "Mecklenburg County")
ind = pd.Series(["37", "119", "005706", "4"], index=["state", "county", "tract", "block group"])
output = synthesize_all(starter, indexes=[ind])
output.to_csv("data/test_synth_output.csv")