pytd
pytd copied to clipboard
Keep single quotes in InsertIntoWriter
What's changed
InsertIntoWriter converts a single quote to a double quote because single quotes are reserved for string literals and double quotes are column names.
However, this conversion will break users' characters when their data have a single quote.
For example, current behavior changes St. Patrick's day to St. Patrick"s day implicitly.
To eliminate this issue, this pull request aims to keep single quotes.
Example
Prepare the following code and run:
>>> import pytd
>>> import pandas as pd
>>> client = pytd.Client()
>>> df = pd.DataFrame(data={'name': ["John' Doe"], 'email': ['[email protected]'], 'description': ["Add two single quotes '', then insert new line \nfor a test."]})
>>> client.load_table_from_dataframe(df, 'akito_test.mytest', writer='insert_into', if_exists='append')
Then, pytd kicks the following queries and results:
(1) Current behavior (converted single quote to double quote) Job Info
% td job:show 1815467019
JobID : 1815467019
...
Start At : 2023-05-18 08:29:41 UTC
End At : 2023-05-18 08:29:42 UTC
...
Query : -- client: pytd/1.4.0 (prestodb/0.8.3; tdclient/1.2.1)
-- Client#query
INSERT INTO akito_test.mytest (name, email, description) VALUES ('John" Doe', '[email protected]', 'Add two single quotes "", then insert new line
for a test.')
(2) New behavior (keep single quote)
Job Info:
% td job:show 1815470950
JobID : 1815470950
...
Start At : 2023-05-18 08:33:05 UTC
End At : 2023-05-18 08:33:07 UTC
...
Query : -- client: pytd/1.4.0 (prestodb/0.8.3; tdclient/1.2.1)
-- Client#query
INSERT INTO akito_test.mytest (name, email, description) VALUES ('John'' Doe', '[email protected]', 'Add two single quotes '''', then insert new line
for a test.')
InsertIntoWriter could send single quotes without modification.