python-bigquery-pandas icon indicating copy to clipboard operation
python-bigquery-pandas copied to clipboard

AttributeError: 'MultiIndex' object has no attribute 'labels'

Open megancooper opened this issue 5 years ago • 3 comments

pandas_gbq==0.12.0

I have a script that has been runnning for a few weeks, hasn't changed at all. Suddenly in the latest run I saw the error below. (I don't think it was an issue with my bigquery table; it still has data in it, (schema hasn't changed also), and even if it didn't an empty dataframe should be returned without error). I haven't encountered this error before, but it looks like when the script trued to read in data from bigquery like so:

query = """
   select * from dataset.table;
"""
df = pandas_gbq.read_gbq(query,
                             project_id="<my-project-name>",
                             dialect='standard',
                             use_bqstorage_api=True)

The following error was thrown:

Downloading:   0%|          | 0/12926 [00:00<?, ?rows/s]Traceback (most recent call last):
  File "/home/hadoop/metrics.py", line 1396, in <module>
    main()
  File "/home/hadoop/metrics.py", line 108, in my_function()
    use_bqstorage_api=True)
  File "/home/hadoop/conda/lib/python3.7/site-packages/pandas_gbq/gbq.py", line 1034, in read_gbq
    progress_bar_type=progress_bar_type,
  File "/home/hadoop/conda/lib/python3.7/site-packages/pandas_gbq/gbq.py", line 532, in run_query
    progress_bar_type=progress_bar_type,
  File "/home/hadoop/conda/lib/python3.7/site-packages/pandas_gbq/gbq.py", line 562, in _download_results
    progress_bar_type=progress_bar_type,
  File "/home/hadoop/conda/lib/python3.7/site-packages/google/cloud/bigquery/table.py", line 1636, in to_dataframe
    bqstorage_client=bqstorage_client, dtypes=dtypes
  File "/home/hadoop/conda/lib/python3.7/site-packages/google/cloud/bigquery/table.py", line 1414, in _to_page_iterable
    for item in bqstorage_download():
  File "/home/hadoop/conda/lib/python3.7/site-packages/google/cloud/bigquery/_pandas_helpers.py", line 683, in _download_table_bqstorage
    future.result()
  File "/home/hadoop/conda/lib/python3.7/concurrent/futures/_base.py", line 428, in result
    return self.__get_result()
  File "/home/hadoop/conda/lib/python3.7/concurrent/futures/_base.py", line 384, in __get_result
    raise self._exception
  File "/home/hadoop/conda/lib/python3.7/concurrent/futures/thread.py", line 57, in run
    result = self.fn(*self.args, **self.kwargs)
  File "/home/hadoop/conda/lib/python3.7/site-packages/google/cloud/bigquery/_pandas_helpers.py", line 584, in _download_table_bqstorage_stream
    item = page_to_item(page)
  File "/home/hadoop/conda/lib/python3.7/site-packages/google/cloud/bigquery/_pandas_helpers.py", line 572, in _bqstorage_page_to_dataframe
    return page.to_dataframe(dtypes=dtypes)[column_names]
  File "/home/hadoop/conda/lib/python3.7/site-packages/google/cloud/bigquery_storage_v1beta1/reader.py", line 406, in to_dataframe
    return self._stream_parser.to_dataframe(self._message, dtypes=dtypes)
  File "/home/hadoop/conda/lib/python3.7/site-packages/google/cloud/bigquery_storage_v1beta1/reader.py", line 567, in to_dataframe
    df = record_batch.to_pandas()
  File "pyarrow/table.pxi", line 917, in pyarrow.lib.RecordBatch.to_pandas
  File "pyarrow/table.pxi", line 1410, in pyarrow.lib.Table.to_pandas
  File "/home/hadoop/conda/lib/python3.7/site-packages/pyarrow/pandas_compat.py", line 621, in table_to_blockmanager
    columns = _flatten_single_level_multiindex(columns)
  File "/home/hadoop/conda/lib/python3.7/site-packages/pyarrow/pandas_compat.py", line 752, in _flatten_single_level_multiindex
    labels, = index.labels
AttributeError: 'MultiIndex' object has no attribute 'labels'

megancooper avatar Jan 31 '20 15:01 megancooper

Seems to be an issue with setting use_bqstorage_api=True, after switching this to False the script ran just fine without any errors...

megancooper avatar Jan 31 '20 17:01 megancooper

Would you be able to share the schema of the table you're trying to read? That may help in reproducing this error.

Also, what versions of the following packages do you have installed?

  • pandas-gbq
  • pandas
  • pyarrow
  • google-cloud-bigquery
  • google-cloud-bigquery-storage

tswast avatar Feb 07 '20 16:02 tswast

I am facing the very same error. Here is my set up:

Schema of the table I am trying to query:

campaignid | INTEGER | NULLABLE |  
smth | INTEGER | NULLABLE |  
smth | INTEGER | NULLABLE |  
smth | FLOAT | NULLABLE |  
smth | INTEGER | NULLABLE |  
smth | FLOAT | NULLABLE |  
smth | FLOAT | NULLABLE |  
smth | FLOAT | NULLABLE |  
smth | FLOAT | NULLABLE |  
smth | FLOAT | NULLABLE |  
smth | FLOAT | NULLABLE |  
smth | FLOAT | NULLABLE |  
smth | FLOAT | NULLABLE |  
smth | FLOAT | NULLABLE |  
yyyymmdd | DATE | NULLABLE

the table is partitioned by yyyymmdd and clustered by campaignid.

And here is enviroment

pandas-gbq                0.13.0                   py36_1    conda-forge
pandas                    1.0.1            py36h0573a6f_0  
pyarrow                   0.14.0           py36h8b68381_0    conda-forge
google-cloud-bigquery     1.23.0                     py_0    conda-forge
google-cloud-bigquery-core 1.23.0                  py36_0    conda-forge
google-cloud-bigquery-storage 0.7.0                		1    conda-forge
google-cloud-bigquery-storage-core 0.7.0           py36_1    conda-forge

cadama avatar Feb 27 '20 12:02 cadama

Hi @megancooper and @cadama — the relevant client libraries have change quite a bit since this issue was filed (we are now not only on a GA'd version of BigQuery Storage but it's also v2+ now). If you are still experiencing this issue using GA'd versions of the BigQuery client libraries with this connector, can you please file a new issue with additional repro info? Thanks!

meredithslota avatar Feb 07 '23 04:02 meredithslota