objectiv-analytics
objectiv-analytics copied to clipboard
Bug Report: Exception `only bigquery supports snowplow flat format` on creating a DF with a non-existent `table_name`
Describe the bug
When I create a DataFrame using get_objectiv_dataframe(), and a non-existent table_name parameter (e.g. 'data_clean' on staging) I receive a strange exception: 'only bigquery supports snowplow flat format` (also note the improper capitalization).
Steps To Reproduce Steps to reproduce the behavior:
- Call
get_objectiv_dataframe()with a non-existent table for thetable_nameparameter. - See the error.
Expected behavior A descriptive error message telling me it cannot find the table (plus proper capitalization of 'BiqQuery' and 'Snowplow' in the current 'wrong' error).
Screenshots or Logs
---------------------------------------------------------------------------
Exception Traceback (most recent call last)
Input In [25], in <cell line: 8>()
6 modelhub = ModelHub(time_aggregation='%Y-%m-%d', global_contexts=['application', 'path'])
7 port = connect_tunnel(ssh_host='[REDACTED]', db_port=5432)
----> 8 df = modelhub.get_objectiv_dataframe(db_url=f'postgresql://[REDACTED]@localhost:{port}/objectiv',
9 start_date=start_date,
10 end_date=end_date,
11 table_name='data_clean')
File ~/.local/lib/python3.8/site-packages/modelhub/modelhub.py:157, in ModelHub.get_objectiv_dataframe(self, db_url, table_name, start_date, end_date, bq_credentials_path, with_sessionized_data, session_gap_seconds, identity_resolution, anonymize_unidentified_users)
154 else:
155 table_name = 'data'
--> 157 data = get_objectiv_data(
158 engine=engine,
159 table_name=table_name,
160 start_date=start_date,
161 end_date=end_date,
162 with_sessionized_data=with_sessionized_data,
163 session_gap_seconds=session_gap_seconds,
164 identity_resolution=identity_resolution,
165 anonymize_unidentified_users=anonymize_unidentified_users,
166 global_contexts=self._global_contexts
167 )
169 # get_objectiv_data returns both series as bach.SeriesJson.
170 data['location_stack'] = data.location_stack.astype('objectiv_location_stack')
File ~/.local/lib/python3.8/site-packages/modelhub/pipelines/util.py:50, in get_objectiv_data(engine, table_name, session_gap_seconds, set_index, start_date, end_date, with_sessionized_data, identity_resolution, anonymize_unidentified_users, global_contexts)
47 if identity_resolution and 'identity' not in global_contexts:
48 global_contexts.append('identity')
---> 50 contexts_pipeline = ExtractedContextsPipeline(engine=engine, table_name=table_name,
51 global_contexts=global_contexts)
52 sessionized_pipeline = SessionizedDataPipeline(session_gap_seconds=session_gap_seconds)
53 identity_pipeline = IdentityResolutionPipeline(identity_id=identity_resolution)
File ~/.local/lib/python3.8/site-packages/modelhub/pipelines/extracted_contexts.py:112, in ExtractedContextsPipeline.__init__(self, engine, table_name, global_contexts)
109 else:
110 # No taxonomy column, we need to gather the contexts separately.
111 if not is_bigquery(engine):
--> 112 raise Exception('only bigquery supports snowplow flat format.')
114 self._base_dtypes = self._get_bq_sp_base_dtypes(dtypes, global_contexts)
116 self._validate_data_dtypes(
117 expected_dtypes=self._base_dtypes,
118 current_dtypes=dtypes,
119 )
Exception: only bigquery supports snowplow flat format.
Environment (please complete the following information where relevant):
- Bach/MH 0.0.22.