WrenAI
WrenAI copied to clipboard
DuckDB dataset column names contain unconventional symbols make text-to-sql fail
Describe the bug since now we support reading csv, json, parquet files using DuckDB; however, there are no strict rules on column names used in the dataset and this can cause us troubles generating valid SQL.
for example
- column names that start with number, ex:
3pointplay - column names that contain dot(.)
To Reproduce Steps to reproduce the behavior:
you can use the following DuckDB’s initSQL statements to import the dataset. the following dataset contains column names having dot(.)
CREATE TABLE earthquakes AS SELECT * FROM read_csv('https://corgis-edu.github.io/corgis/datasets/csv/earthquakes/earthquakes.csv', header=True)
Expected behavior we should define a rule for column names, and rewrite the original column names if they break our defined rule
Desktop (please complete the following information):
- OS: MaxOS
- Browser: Brave
WrenAI Information
- Version: 0.2.1
Additional context BigQuery column name rules: https://cloud.google.com/bigquery/docs/schemas#column_names