objectiv-analytics
objectiv-analytics copied to clipboard
Bug Report: SeriesInt64.median() gives different result than Pandas & doesn't work on BigQuery
Describe the bug Two bugs really:
- SeriesInt64.median() gives a floored integer value, whereas Pandas seems to always gives a float
- median() doesn't seem to work for BigQuery
Steps To Reproduce
def test_median(engine):
# TODO: needs improvement, only testing one simple case here
pdf = pd.DataFrame(data={'x': [1, 2, 3, 4]})
df = DataFrame.from_pandas(engine=engine, df=pdf.reset_index(drop=False), convert_objects=True,
materialization='cte')
df = df.reset_index(drop=True)
df = df.drop(columns=['index'])
sx = df.x.median()
expected_value = 2.5
# Check behaviour is the same as with pandas
assert pdf.x.median() == expected_value
# next step will fail, as the calculated value is `2` instead of `2.5`
assert_equals_data(
sx,
expected_columns=['x'],
expected_data=[[expected_value]]
)
Expected behavior
- Test passes for Postgres
- Test passes for BigQuery
Additional context
- Probably really easy to fix for Postgres by switching from
percentile_disttopercentile_cont. - Make sure the return type of
median()is correct