SNOW-977836 Update dataframe.py
Refer this doc -> https://docs.google.com/document/d/1BtcercvMKIqaMUzLWMDrJqS_JlKTniMFKqxGxB5FxiA
In the withColumnRenamed function, the string function upper() is converting the column name to upper case and hence when the dataframe has two columns with same name but different case as show Ex- Column names = ['Snow Flake', 'SNOW FLAKE']. When the user tries to rename 'Snow Flake' column to 'Snow Flake Renamed', the current withColumnRenamed method throws an exception as the method converts 'Snow Flake' to upper case but since the dataframe already has another column called 'SNOW FLAKE', the to_be_renamed list will have 2 elements and hence an exception will be raised.
Please answer these questions before submitting your pull requests. Thanks!
-
What GitHub issue is this PR addressing? Make sure that there is an accompanying issue to your PR.
Fixes - #1148
-
Fill out the following pre-review checklist:
- [x] I am adding a new automated test(s) to verify correctness of my new code
- [ ] I am adding new logging messages
- [ ] I am adding a new telemetry message
- [ ] I am adding new credentials
- [ ] I am adding a new dependency
-
Please describe how your code solves the related issue.
In the withColumnRenamed function in dataframe.py file, in line 3541, upper() function is converting the column name to upper case and hence when the dataframe has two columns with same name but different case as show Ex- Column names = ['Snow Flake', 'SNOW FLAKE']. When the user tries to rename 'Snow Flake' column to a name called 'Snow Flake Renamed', the current withColumnRenamed method throws an exception as the method converts 'Snow Flake' to upper case but since the dataframe already has another column called 'SNOW FLAKE', the to_be_renamed list will have 2 elements and hence an exception will be raised. Removing the upper function will make sure that we compare the columns to be renamed and the existing dataframe columns in the same case as they exist and will not raise an exception if we have multiple column with same name but different case.
CLA Assistant Lite bot All contributors have signed the CLA ✍️ ✅
I have read the CLA Document and I hereby sign the CLA
certainly fixes the issue, but I think tests need to be added in test_dataframe.py.
Also please see my comment on renaming of duplicated columns.
Sure, will add test cases around this functionality
Hey @suenalaba , pls see my comments on #1148 , I've added test case to test_dataframe.py with the name test_with_column_renamed_case_sensitivity()
looks good to me now