Where JSONB column is_null produces incorrect query statement
python 3.10.1 piccolo: 0.66.0 db: Postgres
When doing a select or otherwise and comparing a JSONB columns to None or .is_null() the query string adds in a python object causing an error in the Postgres prepare statement.
Table.select(Table.id).where(Table.jsonb_column == None).run_sync()
Table.select(Table.id).where(Table.jsonb_column.is_null()).run_sync()
Emitted sql
SELECT "table"."id" FROM table WHERE "table"."jsonb_column" IS NULL'"<piccolo.columns.combination.Undefined object at 0x103eded10>"'
Error
asyncpg.exceptions.PostgresSyntaxError: syntax error at or near "NULL$1"
You're right - I've been able to replicate this.
It's pretty strange. I'll look into it.
Not sure if relevant or already addressed by this but when doing an Table.insert(Table(jsonb_column=None)) Piccolo doesn't appear to encode the value as a SQL NULL value which makes NOT NULL and IS NULL comparison not behave as expected. The value will show up as null in the database but it will be a jsonb serialized null instead of a SQL NULL.
Maybe a SQL_NULL delegate so the ORM knows to add it to the query as NULL without quoting it?
Tested with setting the column default to None and that value is appropriately set to a SQL NULL so this only occurs when doing an INSERT with a None.
@theelderbeever Yes, you're right - I only discovered that yesterday. I added a fix for it as part of the same PR:
https://github.com/piccolo-orm/piccolo/pull/413
What I was thinking was, if the column is nullable:
- Specify
Nonefor SQL null. - Specify.
'null'for JSON null
If the column isn't nullable then both None and 'null' give JSON null.
What do you think?
With the risk of having too many configurable arguments... I think None always being SQL NULL is a bit more consistent. That way devs can rely on the NOT NULL constraint to reject values.
With that said.
nullable:
-
None== SQL NULL -
'null'== JSON nulll
not-nullable
-
Nonerejected on NOT NULL constraint -
Nonewithcoerce_json_null=Truearg == JSON null -
'null'== JSON null
Not entirely sure where the coerce_json_null arg lives though.... on the INSERT/UPDATE call probably?
Thoughts?
@theelderbeever I think you're right - making None always mean SQL null is more straight forward and easier to understand.
Another option for the coercion could be...
class Table(Table):
jsonb_col = JSONB(none_as_null=True)`
Also I promise to stop hammering on the json stuff soon but noted this as well...
This works...
await Table.raw(
"""
INSERT INTO table (jsonb_column)
VALUES
({})
""",
'{"hello": "world"}'
)
This doesn't...
await Table.raw(
"""
INSERT INTO table (jsonb_column)
VALUES
( '{"hello": "world"}' )
"""
)
The emitted SQL turns the json into an empty string as shown
INSERT INTO table (jsonb_column)
VALUES
('')
which raises an error when run.
InvalidTextRepresentationError: invalid input syntax for type json
DETAIL: The input string ended unexpectedly.
Has a simple workaround obviously but, just something to take note of.
@theelderbeever Ah, ok. Piccolo QueryString uses curly braces as placeholders, so it's seeing {"hello": "world"} as a placeholder instead of an actual value.
Internally Querystring uses Formatter.parse (docs) to deconstruct the string:
Formatter().parse('select * from foo where a = {}')
I can't figure out a way of making it ignore some curly braces. It seems like neither \{ or {{ works, so not sure what the solution is, besides replacing Formatter().parse with a different parser.
Yeah I see what you mean... Not a lot of options there. Shame there isn't some combination of Template and Formatter.
Expounding on the above example when providing multiple values where an int or otherwise follows a jsonb the prepare statement fails in a strange way.
/asyncpg/protocol/prepared_stmt.pyx:197, in asyncpg.protocol.protocol.PreparedStatementState._encode_bind_msg()
DataError: invalid input for query argument $3: 10 (expected str, got int)
await Employees.raw(
"""
WITH temp as (
SELECT name, jsonb_data, salary
FROM (
VALUES
({}, {}, {})
) t (name, jsonb_data, salary)
)
UPDATE employees e
SET
jsonb_data = t.jsonb_data,
salary = t.salary
WHERE
e.name = t.name
""",
'Bob', '{"hello": "world"}', 100_000
)
Strangely, the emitted SQL can be pasted into another client and works fine.
This specifically occurs when the values get passed through the QueryString.
@theelderbeever I'm trying to recreate that DataError issue. I made this simple script, but am getting a syntax error for the SQL - can you spot the issue?
from piccolo.table import Table
from piccolo.columns.column_types import Varchar, JSONB, Integer
from piccolo.engine.postgres import PostgresEngine
DB = PostgresEngine({"database": "jsonb_debug"})
class Employees(Table, db=DB):
name = Varchar()
jsonb_data = JSONB()
salary = Integer()
def main():
Employees.create_table(if_not_exists=True).run_sync()
Employees.raw(
"""
WITH temp as (
SELECT name, jsonb_data, salary
FROM (
VALUES
({}, {}, {})
) t (name, jsonb_data, salary)
)
UPDATE employees e
SET
jsonb_data = t.jsonb_data,
salary = t.salary
WHERE
e.name = t.name
""",
"Bob",
'{"hello": "world"}',
100_000,
).run_sync()
if __name__ == '__main__':
main()
I'll experiment a bit more.
Ah whoops copy-paste-edit error when creating a non project specific example....
This should do the trick. Two edits...
- Added
FROM temp tafter theSETclause. - Also realized that the jsonb column has to be type cast coming from the temp table. So added
t.jsonb_data::jsonb
WITH temp as (
SELECT name, jsonb_data, salary
FROM (
VALUES
({}, {}, {})
) t (name, jsonb_data, salary)
)
UPDATE employees e
SET
jsonb_data = t.jsonb_data::jsonb,
salary = t.salary
FROM temp t
WHERE
e.name = t.name
@theelderbeever I had a look into it a bit more, and I could replicate the error:
# Fails
await Employees.raw(
"""
WITH temp as (
SELECT name, jsonb_data, salary
FROM (
VALUES
({}, {}, {})
) t (name, jsonb_data, salary)
)
UPDATE employees e
SET
jsonb_data = t.jsonb_data::jsonb,
salary = t.salary
FROM temp t
WHERE
e.name = t.name
""",
"Bob",
'{"hello": "world"}',
100_000,
)
The error is:
asyncpg.exceptions.DatatypeMismatchError: column "salary" is of type integer but expression is of type text
HINT: You will need to rewrite or cast the expression.
It seems like the issue is with asyncpg. When I directly passed the query to asyncpg, bypassing Piccolo entirely, I got the same error:
# Fails
await DB._run_in_new_connection(
"""
WITH temp as (
SELECT name, jsonb_data, salary
FROM (
VALUES
($1, $2, $3)
) t (name, jsonb_data, salary)
)
UPDATE employees e
SET
jsonb_data = t.jsonb_data::jsonb,
salary = t.salary
FROM temp t
WHERE
e.name = t.name
""",
[
"Bob",
'{"hello": "world"}',
100_000
]
)
How about directly doing an update, rather than using WITH?
await DB._run_in_new_connection(
"""
UPDATE employees e
SET
jsonb_data = $2,
salary = $3
WHERE
e.name = $1
""",
[
"Bob",
'{"hello": "world"}',
100_000
]
)
I'll create an issue in asyncpg.
@theelderbeever I think I know the issue now.
If you do this, then it works OK:
await Employees.raw(
"""
WITH temp as (
SELECT name, jsonb_data, salary
FROM (
VALUES
({}, {}, {}::integer) -- <------- here
) t (name, jsonb_data, salary)
)
UPDATE employees e
SET
jsonb_data = t.jsonb_data::jsonb,
salary = t.salary
FROM temp t
WHERE
e.name = t.name
""",
"Bob",
'{"hello": "world"}',
100_000,
)
Postgres must assume that each value coming back from the WITH statement will be a string, which is why it complains when we're passing an integer.
How about directly doing an update, rather than using WITH?
Part of the reason I haven't been doing a direct approach is because the real application is a bulk update so the WHERE e.name = t.name is needed along with the CTE (or subquery).
Just discovered though that the VALUES list causes the bad typing not the CTE. And applying the casting in the SELECT seems to be the smoothest.
await Employees.raw(
"""
WITH temp as (
SELECT name, jsonb_data::jsonb, salary::integer -- <------- here
FROM (
VALUES
({}, {}, {}),
({}, {}, {})
) t (name, jsonb_data, salary)
)
UPDATE employees e
SET
jsonb_data = t.jsonb_data,
salary = t.salary
FROM temp t
WHERE
e.name = t.name
""",
"Bob", '{"hello": "world"}', 100_000,
"Mary", '{"hello": "mars"}', 1_000_000
)
Thanks for all the work on this btw! Been a little deep in the weeds on this one.
@theelderbeever No worries - it was a good learning experience.
FWIW would it be valuable to have the querystring automatically attempt to json.dump lists and dictionaries when doing something like the above? Right now it just directly inserts the value and that tends to fail the sql. Which means that manually dumping the values has to be done.
Edit: it appears asyncpg wants absolutely everything to be a string... UUID, list/dict, and even numerics.