arrow icon indicating copy to clipboard operation
arrow copied to clipboard

Parquet export of timestamp cols w/o timezone adds tz info anyways

Open xuxoramos opened this issue 10 months ago • 0 comments

When exporting a DuckDB table to parquet, timestamp columns are forcibly added timezone info, even if the original DuckDB timestamp type does not include any info on TZ.

Reproducible example:

-- Create the table with three columns
CREATE TABLE github_issue (
    id INTEGER,
    description TEXT,
    date DATE
);

-- Insert mock data into the table
INSERT INTO github_issue (id, description, date) VALUES
(1, 'Issue with login functionality', '2025-03-01'),
(2, 'Error in data processing pipeline', '2025-03-15'),
(3, 'UI bug on the dashboard', '2025-03-25');

-- Create a new table github_issue_step_2 with the same structure but date as TIMESTAMP
CREATE TABLE github_issue_step_2 (
    id INTEGER,
    description TEXT,
    date TIMESTAMP
);

-- Insert data from github_issue into github_issue_step_2
-- Convert the date column to a timestamp
INSERT INTO github_issue_step_2 (id, description, date)
SELECT id, description, CAST(date AS TIMESTAMP)
FROM github_issue;

COPY github_issue_step_2 TO './github_issue_step_2.parquet' (FORMAT 'parquet');

After executing the script, inspect the resulting parquet file to observe it has a Z character at the end, which indicates UTC timezone.

xuxoramos avatar Mar 30 '25 06:03 xuxoramos