AttributeError: 'MultiIndex' object has no attribute 'labels'
pandas_gbq==0.12.0
I have a script that has been runnning for a few weeks, hasn't changed at all. Suddenly in the latest run I saw the error below. (I don't think it was an issue with my bigquery table; it still has data in it, (schema hasn't changed also), and even if it didn't an empty dataframe should be returned without error). I haven't encountered this error before, but it looks like when the script trued to read in data from bigquery like so:
query = """
select * from dataset.table;
"""
df = pandas_gbq.read_gbq(query,
project_id="<my-project-name>",
dialect='standard',
use_bqstorage_api=True)
The following error was thrown:
Downloading: 0%| | 0/12926 [00:00<?, ?rows/s]Traceback (most recent call last):
File "/home/hadoop/metrics.py", line 1396, in <module>
main()
File "/home/hadoop/metrics.py", line 108, in my_function()
use_bqstorage_api=True)
File "/home/hadoop/conda/lib/python3.7/site-packages/pandas_gbq/gbq.py", line 1034, in read_gbq
progress_bar_type=progress_bar_type,
File "/home/hadoop/conda/lib/python3.7/site-packages/pandas_gbq/gbq.py", line 532, in run_query
progress_bar_type=progress_bar_type,
File "/home/hadoop/conda/lib/python3.7/site-packages/pandas_gbq/gbq.py", line 562, in _download_results
progress_bar_type=progress_bar_type,
File "/home/hadoop/conda/lib/python3.7/site-packages/google/cloud/bigquery/table.py", line 1636, in to_dataframe
bqstorage_client=bqstorage_client, dtypes=dtypes
File "/home/hadoop/conda/lib/python3.7/site-packages/google/cloud/bigquery/table.py", line 1414, in _to_page_iterable
for item in bqstorage_download():
File "/home/hadoop/conda/lib/python3.7/site-packages/google/cloud/bigquery/_pandas_helpers.py", line 683, in _download_table_bqstorage
future.result()
File "/home/hadoop/conda/lib/python3.7/concurrent/futures/_base.py", line 428, in result
return self.__get_result()
File "/home/hadoop/conda/lib/python3.7/concurrent/futures/_base.py", line 384, in __get_result
raise self._exception
File "/home/hadoop/conda/lib/python3.7/concurrent/futures/thread.py", line 57, in run
result = self.fn(*self.args, **self.kwargs)
File "/home/hadoop/conda/lib/python3.7/site-packages/google/cloud/bigquery/_pandas_helpers.py", line 584, in _download_table_bqstorage_stream
item = page_to_item(page)
File "/home/hadoop/conda/lib/python3.7/site-packages/google/cloud/bigquery/_pandas_helpers.py", line 572, in _bqstorage_page_to_dataframe
return page.to_dataframe(dtypes=dtypes)[column_names]
File "/home/hadoop/conda/lib/python3.7/site-packages/google/cloud/bigquery_storage_v1beta1/reader.py", line 406, in to_dataframe
return self._stream_parser.to_dataframe(self._message, dtypes=dtypes)
File "/home/hadoop/conda/lib/python3.7/site-packages/google/cloud/bigquery_storage_v1beta1/reader.py", line 567, in to_dataframe
df = record_batch.to_pandas()
File "pyarrow/table.pxi", line 917, in pyarrow.lib.RecordBatch.to_pandas
File "pyarrow/table.pxi", line 1410, in pyarrow.lib.Table.to_pandas
File "/home/hadoop/conda/lib/python3.7/site-packages/pyarrow/pandas_compat.py", line 621, in table_to_blockmanager
columns = _flatten_single_level_multiindex(columns)
File "/home/hadoop/conda/lib/python3.7/site-packages/pyarrow/pandas_compat.py", line 752, in _flatten_single_level_multiindex
labels, = index.labels
AttributeError: 'MultiIndex' object has no attribute 'labels'
Seems to be an issue with setting use_bqstorage_api=True, after switching this to False the script ran just fine without any errors...
Would you be able to share the schema of the table you're trying to read? That may help in reproducing this error.
Also, what versions of the following packages do you have installed?
-
pandas-gbq -
pandas -
pyarrow -
google-cloud-bigquery -
google-cloud-bigquery-storage
I am facing the very same error. Here is my set up:
Schema of the table I am trying to query:
campaignid | INTEGER | NULLABLE |
smth | INTEGER | NULLABLE |
smth | INTEGER | NULLABLE |
smth | FLOAT | NULLABLE |
smth | INTEGER | NULLABLE |
smth | FLOAT | NULLABLE |
smth | FLOAT | NULLABLE |
smth | FLOAT | NULLABLE |
smth | FLOAT | NULLABLE |
smth | FLOAT | NULLABLE |
smth | FLOAT | NULLABLE |
smth | FLOAT | NULLABLE |
smth | FLOAT | NULLABLE |
smth | FLOAT | NULLABLE |
yyyymmdd | DATE | NULLABLE
the table is partitioned by yyyymmdd and clustered by campaignid.
And here is enviroment
pandas-gbq 0.13.0 py36_1 conda-forge
pandas 1.0.1 py36h0573a6f_0
pyarrow 0.14.0 py36h8b68381_0 conda-forge
google-cloud-bigquery 1.23.0 py_0 conda-forge
google-cloud-bigquery-core 1.23.0 py36_0 conda-forge
google-cloud-bigquery-storage 0.7.0 1 conda-forge
google-cloud-bigquery-storage-core 0.7.0 py36_1 conda-forge
Hi @megancooper and @cadama — the relevant client libraries have change quite a bit since this issue was filed (we are now not only on a GA'd version of BigQuery Storage but it's also v2+ now). If you are still experiencing this issue using GA'd versions of the BigQuery client libraries with this connector, can you please file a new issue with additional repro info? Thanks!