api-python icon indicating copy to clipboard operation
api-python copied to clipboard

Improve the error message when `build_multivariate_dataframe` has the list of stat_vars more than the batch_size

Open sharadshriram opened this issue 3 years ago • 0 comments

cc: @shifucun

I was using a script to build_multivariate_dataframe for a stat_var list of length more than 50 and got the following error:

Traceback (most recent call last):
  File "/home/sharadshriram/accessible_charts/datasets/datacommons/get_data.py", line 88, in <module>
    save_statvar_to_csv(place, 'data.csv')
  File "/home/sharadshriram/accessible_charts/datasets/datacommons/get_data.py", line 67, in save_statvar_to_csv
    df = dpd.build_multivariate_dataframe([place], stat_vars)
  File "/home/sharadshriram/env/lib/python3.10/site-packages/datacommons_pandas/df_builder.py", line 314, in build_multivariate_dataframe
    df = pd.DataFrame.from_records(_multivariate_pd_input(places, stat_vars))
  File "/home/sharadshriram/env/lib/python3.10/site-packages/datacommons_pandas/df_builder.py", line 238, in _multivariate_pd_input
    rows_dict = _group_stat_all_by_obs_options(places,
  File "/home/sharadshriram/env/lib/python3.10/site-packages/datacommons_pandas/df_builder.py", line 88, in _group_stat_all_by_obs_options
    stat_all = dc.get_stat_all(places, stat_vars)
  File "/home/sharadshriram/env/lib/python3.10/site-packages/datacommons_pandas/stat_vars.py", line 226, in get_stat_all
    batches = -(-len(places) // places_per_batch)
ZeroDivisionError: integer division or modulo by zero

However, ZeroDivisionError: integer division or modulo by zero did not help me understand what caused the ZeroDivisionError. After backtracking, I observed the error was caused not because of batching, but because the len(stat_var) passed to dc.get_stat_all(places, stat_vars) was greater than 50.

Is it possible for the error message to read out that the length of stat_var list passed is more than the batch_size limit of 50?

I also wonder whether, we can extend the get_stat_all() method to chunk long lists of stat_var to length 50, and do the API query. Would like to hear your thoughts?

sharadshriram avatar Oct 13 '22 04:10 sharadshriram