snowpark-python icon indicating copy to clipboard operation
snowpark-python copied to clipboard

SNOW-977836 Update dataframe.py

Open Ilyas-kipi opened this issue 2 years ago • 6 comments

Refer this doc -> https://docs.google.com/document/d/1BtcercvMKIqaMUzLWMDrJqS_JlKTniMFKqxGxB5FxiA

In the withColumnRenamed function, the string function upper() is converting the column name to upper case and hence when the dataframe has two columns with same name but different case as show Ex- Column names = ['Snow Flake', 'SNOW FLAKE']. When the user tries to rename 'Snow Flake' column to 'Snow Flake Renamed', the current withColumnRenamed method throws an exception as the method converts 'Snow Flake' to upper case but since the dataframe already has another column called 'SNOW FLAKE', the to_be_renamed list will have 2 elements and hence an exception will be raised.

Please answer these questions before submitting your pull requests. Thanks!

  1. What GitHub issue is this PR addressing? Make sure that there is an accompanying issue to your PR.

    Fixes - #1148

  2. Fill out the following pre-review checklist:

    • [x] I am adding a new automated test(s) to verify correctness of my new code
    • [ ] I am adding new logging messages
    • [ ] I am adding a new telemetry message
    • [ ] I am adding new credentials
    • [ ] I am adding a new dependency
  3. Please describe how your code solves the related issue.

    In the withColumnRenamed function in dataframe.py file, in line 3541, upper() function is converting the column name to upper case and hence when the dataframe has two columns with same name but different case as show Ex- Column names = ['Snow Flake', 'SNOW FLAKE']. When the user tries to rename 'Snow Flake' column to a name called 'Snow Flake Renamed', the current withColumnRenamed method throws an exception as the method converts 'Snow Flake' to upper case but since the dataframe already has another column called 'SNOW FLAKE', the to_be_renamed list will have 2 elements and hence an exception will be raised. Removing the upper function will make sure that we compare the columns to be renamed and the existing dataframe columns in the same case as they exist and will not raise an exception if we have multiple column with same name but different case.

Ilyas-kipi avatar Nov 25 '23 12:11 Ilyas-kipi

CLA Assistant Lite bot All contributors have signed the CLA ✍️ ✅

github-actions[bot] avatar Nov 25 '23 12:11 github-actions[bot]

I have read the CLA Document and I hereby sign the CLA

Ilyas-kipi avatar Nov 25 '23 12:11 Ilyas-kipi

certainly fixes the issue, but I think tests need to be added in test_dataframe.py.

Also please see my comment on renaming of duplicated columns.

suenalaba avatar Dec 14 '23 03:12 suenalaba

Sure, will add test cases around this functionality

Ilyas-kipi avatar Dec 19 '23 10:12 Ilyas-kipi

Hey @suenalaba , pls see my comments on #1148 , I've added test case to test_dataframe.py with the name test_with_column_renamed_case_sensitivity()

Ilyas-kipi avatar Dec 19 '23 18:12 Ilyas-kipi

looks good to me now

suenalaba avatar Dec 21 '23 13:12 suenalaba