objectiv-analytics icon indicating copy to clipboard operation
objectiv-analytics copied to clipboard

Bug Report: Exception `only bigquery supports snowplow flat format` on creating a DF with a non-existent `table_name`

Open ivarpruijn opened this issue 3 years ago • 0 comments

Describe the bug When I create a DataFrame using get_objectiv_dataframe(), and a non-existent table_name parameter (e.g. 'data_clean' on staging) I receive a strange exception: 'only bigquery supports snowplow flat format` (also note the improper capitalization).

Steps To Reproduce Steps to reproduce the behavior:

  1. Call get_objectiv_dataframe() with a non-existent table for the table_name parameter.
  2. See the error.

Expected behavior A descriptive error message telling me it cannot find the table (plus proper capitalization of 'BiqQuery' and 'Snowplow' in the current 'wrong' error).

Screenshots or Logs

---------------------------------------------------------------------------
Exception                                 Traceback (most recent call last)
Input In [25], in <cell line: 8>()
      6 modelhub = ModelHub(time_aggregation='%Y-%m-%d', global_contexts=['application', 'path'])
      7 port = connect_tunnel(ssh_host='[REDACTED]', db_port=5432)
----> 8 df = modelhub.get_objectiv_dataframe(db_url=f'postgresql://[REDACTED]@localhost:{port}/objectiv',
      9     start_date=start_date, 
     10     end_date=end_date,
     11     table_name='data_clean')

File ~/.local/lib/python3.8/site-packages/modelhub/modelhub.py:157, in ModelHub.get_objectiv_dataframe(self, db_url, table_name, start_date, end_date, bq_credentials_path, with_sessionized_data, session_gap_seconds, identity_resolution, anonymize_unidentified_users)
    154     else:
    155         table_name = 'data'
--> 157 data = get_objectiv_data(
    158     engine=engine,
    159     table_name=table_name,
    160     start_date=start_date,
    161     end_date=end_date,
    162     with_sessionized_data=with_sessionized_data,
    163     session_gap_seconds=session_gap_seconds,
    164     identity_resolution=identity_resolution,
    165     anonymize_unidentified_users=anonymize_unidentified_users,
    166     global_contexts=self._global_contexts
    167 )
    169 # get_objectiv_data returns both series as bach.SeriesJson.
    170 data['location_stack'] = data.location_stack.astype('objectiv_location_stack')

File ~/.local/lib/python3.8/site-packages/modelhub/pipelines/util.py:50, in get_objectiv_data(engine, table_name, session_gap_seconds, set_index, start_date, end_date, with_sessionized_data, identity_resolution, anonymize_unidentified_users, global_contexts)
     47 if identity_resolution and 'identity' not in global_contexts:
     48     global_contexts.append('identity')
---> 50 contexts_pipeline = ExtractedContextsPipeline(engine=engine, table_name=table_name,
     51                                               global_contexts=global_contexts)
     52 sessionized_pipeline = SessionizedDataPipeline(session_gap_seconds=session_gap_seconds)
     53 identity_pipeline = IdentityResolutionPipeline(identity_id=identity_resolution)

File ~/.local/lib/python3.8/site-packages/modelhub/pipelines/extracted_contexts.py:112, in ExtractedContextsPipeline.__init__(self, engine, table_name, global_contexts)
    109 else:
    110     # No taxonomy column, we need to gather the contexts separately.
    111     if not is_bigquery(engine):
--> 112         raise Exception('only bigquery supports snowplow flat format.')
    114     self._base_dtypes = self._get_bq_sp_base_dtypes(dtypes, global_contexts)
    116 self._validate_data_dtypes(
    117     expected_dtypes=self._base_dtypes,
    118     current_dtypes=dtypes,
    119 )

Exception: only bigquery supports snowplow flat format.

Environment (please complete the following information where relevant):

  • Bach/MH 0.0.22.

ivarpruijn avatar Sep 15 '22 13:09 ivarpruijn