pandas icon indicating copy to clipboard operation
pandas copied to clipboard

BUG Fixing columns dropped from multi index in group by transform GH4…

Open mattB1989 opened this issue 3 years ago • 4 comments

…7787

  • [x] closes #47787
  • [x] Tests added and passed if fixing a bug or adding a new feature
  • [x] All code checks passed.
  • [x] Added type annotations to new arguments/methods/functions.
  • [x] Added an entry in the latest doc/source/whatsnew/vX.X.X.rst file if fixing a bug or adding a new feature.

mattB1989 avatar Jul 24 '22 21:07 mattB1989

Hello @mattB1989! Thanks for updating this PR. We checked the lines you've touched for PEP 8 issues, and found:

There are currently no PEP 8 issues detected in this Pull Request. Cheers! :beers:

Comment last updated at 2022-08-16 17:05:43 UTC

pep8speaks avatar Jul 24 '22 21:07 pep8speaks

@github-actions pre-commit

mattB1989 avatar Jul 24 '22 21:07 mattB1989

@mroeschke This is similar to #47672 but targets a different use case

mattB1989 avatar Jul 24 '22 21:07 mattB1989

@rhshadrach let me know if there is anything else you'd like me to change

mattB1989 avatar Aug 03 '22 19:08 mattB1989

@mattB1989 - what do you think of this:

def test_group_on_empty_multiindex(transformation_func):
    # GH 47787
    # Ensure empty vs non-empty frame/series gives same results
    df = DataFrame(
        data=[[1, Timestamp("today"), 3, 4]],
        columns=["col_1", "col_2", "col_3", "col_4"],
    )
    df = df.set_index(["col_1", "col_2"])
    args = get_groupby_method_args(transformation_func, df)

    result = df.iloc[:0].groupby(["col_1"]).transform(transformation_func, *args)
    expected = df.groupby(["col_1"]).transform(transformation_func, *args)[:0]
    if transformation_func in ("diff", "shift"):
        expected = expected.astype(int)
    tm.assert_equal(result, expected)

    result = df["col_3"].iloc[:0].groupby(["col_1"]).transform(transformation_func, *args)
    expected = df["col_3"].groupby(["col_1"]).transform(transformation_func, *args).iloc[:0]
    if transformation_func in ("diff", "shift"):
        expected = expected.astype(int)
    tm.assert_equal(result, expected)

rhshadrach avatar Aug 14 '22 02:08 rhshadrach

@mattB1989 - what do you think of this:

def test_group_on_empty_multiindex(transformation_func):
    # GH 47787
    # Ensure empty vs non-empty frame/series gives same results
    df = DataFrame(
        data=[[1, Timestamp("today"), 3, 4]],
        columns=["col_1", "col_2", "col_3", "col_4"],
    )
    df = df.set_index(["col_1", "col_2"])
    args = get_groupby_method_args(transformation_func, df)

    result = df.iloc[:0].groupby(["col_1"]).transform(transformation_func, *args)
    expected = df.groupby(["col_1"]).transform(transformation_func, *args)[:0]
    if transformation_func in ("diff", "shift"):
        expected = expected.astype(int)
    tm.assert_equal(result, expected)

    result = df["col_3"].iloc[:0].groupby(["col_1"]).transform(transformation_func, *args)
    expected = df["col_3"].groupby(["col_1"]).transform(transformation_func, *args).iloc[:0]
    if transformation_func in ("diff", "shift"):
        expected = expected.astype(int)
    tm.assert_equal(result, expected)

That's much better than what I have. Committed it now

mattB1989 avatar Aug 14 '22 07:08 mattB1989

there seems to be issues unrelated with the pr - not sure if I can rerun those manually

mattB1989 avatar Aug 14 '22 16:08 mattB1989

The 32-bit tests are failing for diff and shift:

tm.assert_equal(result, expected)
E       AssertionError: Attributes of DataFrame.iloc[:, 0] (column name="col_3") are different
E       
E       Attribute "dtype" are different
E       [left]:  int64
E       [right]: int32

Can you set the dtype to be np.int64 instead.

rhshadrach avatar Aug 15 '22 20:08 rhshadrach

Thanks @mattB1989

mroeschke avatar Aug 17 '22 01:08 mroeschke

Thanks for all the work here @mattB1989!

rhshadrach avatar Aug 17 '22 20:08 rhshadrach