Pandas `read_sql()` does not preserve the correct case of column names.
Describe the bug
Pandas read_sql() (which uses SQLAlchemy) does not preserve the correct case of column names.
To Reproduce
import sqlalchemy as sa
import pandas as pd
e = sa.create_engine(db_uri)
with e.connect() as conn:
q = 'SELECT 1 one, 2 TWO, 3 "three", 4 "FOUR", 5 "Five" from rdb$database'
r = conn.exec_driver_sql(q)
print([x for x, *rest in r.cursor.description])
# ['ONE', 'TWO', 'three', 'FOUR', 'Five']
df = pd.read_sql(sql=q, con=db_uri, dtype_backend="pyarrow")
print(df.columns)
# Index(['one', 'two', 'three', 'four', 'Five'], dtype='object')
Expected behavior
print(df.columns)
# Index(['ONE', 'TWO', 'three', 'FOUR', 'Five'], dtype='object')
Columns should reflect the exact same name from cursor.description.
Unfortunately the root cause of this problem is in SQLAlchemy itself.
SQLAlchemy 2.1 will bring a driver_column_names execution option to fix this.
I'm looking forward to your pull request.
@pauldex I have a workaround for this issue which uses an obscure legacy hook from SQLAlchemy. But I'm not comfortable putting it in the wild. That's why I decided to wait for SQLAlchemy 2.1 driver_column_names.
Are you in need of a solution to fix this issue?
I think we can wait for SQLAlchemy 2.1 for the fix.