databricks-sql-python
databricks-sql-python copied to clipboard
Pandas makes bad DESCRIBE query when using SQLAlchemy
When using the SQLAlchemy engine with Pandas, it seems that Pandas makes a bad DESCRIBE query. Here is the code:
import os
import pandas as pd
from sqlalchemy import create_engine
server_hostname = os.getenv("DATABRICKS_SERVER_HOSTNAME")
http_path = os.getenv("DATABRICKS_HTTP_PATH")
access_token = os.getenv("DATABRICKS_TOKEN")
engine = create_engine(
f"databricks://token:{access_token}@{server_hostname}?http_path={http_path}&catalog=hive_metastore&schema=default",
)
with engine.connect() as connection:
print(pd.read_sql("SELECT * FROM test", connection))
Here are the two query resulting from that code:
It does not do that when using SQL connector instead:
import os
import pandas as pd
from databricks import sql
with sql.connect(
server_hostname=os.getenv("DATABRICKS_SERVER_HOSTNAME"),
http_path=os.getenv("DATABRICKS_HTTP_PATH"),
access_token=os.getenv("DATABRICKS_TOKEN"),
) as connection:
print(pd.read_sql("SELECT * FROM test", connection))
Here are version numbers:
In [1]: import sqlalchemy
In [2]: sqlalchemy.__version__
Out[2]: '1.4.49'
In [3]: from databricks import sql
In [4]: sql.__version__
Out[4]: '2.8.0'
Also, it would be nice if catalog and schema were optional.
What version of pandas do you have installed?
Was on 1.5.3 but just tried on 2.0.3 and get the same thing.
@susodapop I have the same issue - has this been resolved in the newer versions?