objectiv-analytics icon indicating copy to clipboard operation
objectiv-analytics copied to clipboard

Bug Report: SeriesInt64.median() gives different result than Pandas & doesn't work on BigQuery

Open thijs-obj opened this issue 3 years ago • 0 comments

Describe the bug Two bugs really:

  1. SeriesInt64.median() gives a floored integer value, whereas Pandas seems to always gives a float
  2. median() doesn't seem to work for BigQuery

Steps To Reproduce

def test_median(engine):
    # TODO: needs improvement, only testing one simple case here
    pdf = pd.DataFrame(data={'x': [1, 2, 3, 4]})
    df = DataFrame.from_pandas(engine=engine, df=pdf.reset_index(drop=False), convert_objects=True,
                               materialization='cte')
    df = df.reset_index(drop=True)
    df = df.drop(columns=['index'])
    sx = df.x.median()
    expected_value = 2.5
    # Check behaviour is the same as with pandas
    assert pdf.x.median() == expected_value
    # next step will fail, as the calculated value is `2` instead of `2.5`
    assert_equals_data(
        sx,
        expected_columns=['x'],
        expected_data=[[expected_value]]
    )

Expected behavior

  1. Test passes for Postgres
  2. Test passes for BigQuery

Additional context

  • Probably really easy to fix for Postgres by switching from percentile_dist to percentile_cont.
  • Make sure the return type of median() is correct

thijs-obj avatar Jul 18 '22 14:07 thijs-obj