arrow
arrow copied to clipboard
Parquet export of timestamp cols w/o timezone adds tz info anyways
When exporting a DuckDB table to parquet, timestamp columns are forcibly added timezone info, even if the original DuckDB timestamp type does not include any info on TZ.
Reproducible example:
-- Create the table with three columns
CREATE TABLE github_issue (
id INTEGER,
description TEXT,
date DATE
);
-- Insert mock data into the table
INSERT INTO github_issue (id, description, date) VALUES
(1, 'Issue with login functionality', '2025-03-01'),
(2, 'Error in data processing pipeline', '2025-03-15'),
(3, 'UI bug on the dashboard', '2025-03-25');
-- Create a new table github_issue_step_2 with the same structure but date as TIMESTAMP
CREATE TABLE github_issue_step_2 (
id INTEGER,
description TEXT,
date TIMESTAMP
);
-- Insert data from github_issue into github_issue_step_2
-- Convert the date column to a timestamp
INSERT INTO github_issue_step_2 (id, description, date)
SELECT id, description, CAST(date AS TIMESTAMP)
FROM github_issue;
COPY github_issue_step_2 TO './github_issue_step_2.parquet' (FORMAT 'parquet');
After executing the script, inspect the resulting parquet file to observe it has a Z character at the end, which indicates UTC timezone.