WrenAI icon indicating copy to clipboard operation
WrenAI copied to clipboard

DuckDB dataset column names contain unconventional symbols make text-to-sql fail

Open cyyeh opened this issue 1 year ago • 0 comments

Describe the bug since now we support reading csv, json, parquet files using DuckDB; however, there are no strict rules on column names used in the dataset and this can cause us troubles generating valid SQL.

for example

  • column names that start with number, ex: 3pointplay
  • column names that contain dot(.)

To Reproduce Steps to reproduce the behavior:

you can use the following DuckDB’s initSQL statements to import the dataset. the following dataset contains column names having dot(.)

CREATE TABLE earthquakes AS SELECT * FROM read_csv('https://corgis-edu.github.io/corgis/datasets/csv/earthquakes/earthquakes.csv', header=True)

Expected behavior we should define a rule for column names, and rewrite the original column names if they break our defined rule

Desktop (please complete the following information):

  • OS: MaxOS
  • Browser: Brave

WrenAI Information

  • Version: 0.2.1

Additional context BigQuery column name rules: https://cloud.google.com/bigquery/docs/schemas#column_names

cyyeh avatar Apr 30 '24 02:04 cyyeh