BUG Fixing columns dropped from multi index in group by transform GH4…
…7787
- [x] closes #47787
- [x] Tests added and passed if fixing a bug or adding a new feature
- [x] All code checks passed.
- [x] Added type annotations to new arguments/methods/functions.
- [x] Added an entry in the latest
doc/source/whatsnew/vX.X.X.rstfile if fixing a bug or adding a new feature.
Hello @mattB1989! Thanks for updating this PR. We checked the lines you've touched for PEP 8 issues, and found:
There are currently no PEP 8 issues detected in this Pull Request. Cheers! :beers:
Comment last updated at 2022-08-16 17:05:43 UTC
@github-actions pre-commit
@mroeschke This is similar to #47672 but targets a different use case
@rhshadrach let me know if there is anything else you'd like me to change
@mattB1989 - what do you think of this:
def test_group_on_empty_multiindex(transformation_func):
# GH 47787
# Ensure empty vs non-empty frame/series gives same results
df = DataFrame(
data=[[1, Timestamp("today"), 3, 4]],
columns=["col_1", "col_2", "col_3", "col_4"],
)
df = df.set_index(["col_1", "col_2"])
args = get_groupby_method_args(transformation_func, df)
result = df.iloc[:0].groupby(["col_1"]).transform(transformation_func, *args)
expected = df.groupby(["col_1"]).transform(transformation_func, *args)[:0]
if transformation_func in ("diff", "shift"):
expected = expected.astype(int)
tm.assert_equal(result, expected)
result = df["col_3"].iloc[:0].groupby(["col_1"]).transform(transformation_func, *args)
expected = df["col_3"].groupby(["col_1"]).transform(transformation_func, *args).iloc[:0]
if transformation_func in ("diff", "shift"):
expected = expected.astype(int)
tm.assert_equal(result, expected)
@mattB1989 - what do you think of this:
def test_group_on_empty_multiindex(transformation_func): # GH 47787 # Ensure empty vs non-empty frame/series gives same results df = DataFrame( data=[[1, Timestamp("today"), 3, 4]], columns=["col_1", "col_2", "col_3", "col_4"], ) df = df.set_index(["col_1", "col_2"]) args = get_groupby_method_args(transformation_func, df) result = df.iloc[:0].groupby(["col_1"]).transform(transformation_func, *args) expected = df.groupby(["col_1"]).transform(transformation_func, *args)[:0] if transformation_func in ("diff", "shift"): expected = expected.astype(int) tm.assert_equal(result, expected) result = df["col_3"].iloc[:0].groupby(["col_1"]).transform(transformation_func, *args) expected = df["col_3"].groupby(["col_1"]).transform(transformation_func, *args).iloc[:0] if transformation_func in ("diff", "shift"): expected = expected.astype(int) tm.assert_equal(result, expected)
That's much better than what I have. Committed it now
there seems to be issues unrelated with the pr - not sure if I can rerun those manually
The 32-bit tests are failing for diff and shift:
tm.assert_equal(result, expected)
E AssertionError: Attributes of DataFrame.iloc[:, 0] (column name="col_3") are different
E
E Attribute "dtype" are different
E [left]: int64
E [right]: int32
Can you set the dtype to be np.int64 instead.
Thanks @mattB1989
Thanks for all the work here @mattB1989!