piccolo icon indicating copy to clipboard operation
piccolo copied to clipboard

Where JSONB column is_null produces incorrect query statement

Open theelderbeever opened this issue 4 years ago • 18 comments

python 3.10.1 piccolo: 0.66.0 db: Postgres

When doing a select or otherwise and comparing a JSONB columns to None or .is_null() the query string adds in a python object causing an error in the Postgres prepare statement.

Table.select(Table.id).where(Table.jsonb_column == None).run_sync()
Table.select(Table.id).where(Table.jsonb_column.is_null()).run_sync()

Emitted sql

SELECT "table"."id" FROM table WHERE "table"."jsonb_column" IS NULL'"<piccolo.columns.combination.Undefined object at 0x103eded10>"'

Error

asyncpg.exceptions.PostgresSyntaxError: syntax error at or near "NULL$1"

theelderbeever avatar Jan 29 '22 00:01 theelderbeever

You're right - I've been able to replicate this.

It's pretty strange. I'll look into it.

dantownsend avatar Jan 29 '22 17:01 dantownsend

Not sure if relevant or already addressed by this but when doing an Table.insert(Table(jsonb_column=None)) Piccolo doesn't appear to encode the value as a SQL NULL value which makes NOT NULL and IS NULL comparison not behave as expected. The value will show up as null in the database but it will be a jsonb serialized null instead of a SQL NULL.

theelderbeever avatar Jan 31 '22 17:01 theelderbeever

Maybe a SQL_NULL delegate so the ORM knows to add it to the query as NULL without quoting it?

theelderbeever avatar Jan 31 '22 17:01 theelderbeever

Tested with setting the column default to None and that value is appropriately set to a SQL NULL so this only occurs when doing an INSERT with a None.

theelderbeever avatar Jan 31 '22 18:01 theelderbeever

@theelderbeever Yes, you're right - I only discovered that yesterday. I added a fix for it as part of the same PR:

https://github.com/piccolo-orm/piccolo/pull/413

What I was thinking was, if the column is nullable:

  • Specify None for SQL null.
  • Specify. 'null' for JSON null

If the column isn't nullable then both None and 'null' give JSON null.

What do you think?

dantownsend avatar Jan 31 '22 18:01 dantownsend

With the risk of having too many configurable arguments... I think None always being SQL NULL is a bit more consistent. That way devs can rely on the NOT NULL constraint to reject values.

With that said.

nullable:

  • None == SQL NULL
  • 'null' == JSON nulll

not-nullable

  • None rejected on NOT NULL constraint
  • None with coerce_json_null=True arg == JSON null
  • 'null' == JSON null

Not entirely sure where the coerce_json_null arg lives though.... on the INSERT/UPDATE call probably?

Thoughts?

theelderbeever avatar Jan 31 '22 19:01 theelderbeever

@theelderbeever I think you're right - making None always mean SQL null is more straight forward and easier to understand.

dantownsend avatar Jan 31 '22 20:01 dantownsend

Another option for the coercion could be...

class Table(Table):
   jsonb_col = JSONB(none_as_null=True)`

theelderbeever avatar Jan 31 '22 20:01 theelderbeever

Also I promise to stop hammering on the json stuff soon but noted this as well...

This works...

await Table.raw(
    """
    INSERT INTO table (jsonb_column)
    VALUES 
        ({})
    """,
    '{"hello": "world"}'
)

This doesn't...

await Table.raw(
    """
    INSERT INTO table (jsonb_column)
    VALUES 
        ( '{"hello": "world"}' )
    """
)

The emitted SQL turns the json into an empty string as shown

 INSERT INTO table (jsonb_column)
    VALUES 
        ('')

which raises an error when run.

InvalidTextRepresentationError: invalid input syntax for type json
DETAIL:  The input string ended unexpectedly.

Has a simple workaround obviously but, just something to take note of.

theelderbeever avatar Jan 31 '22 20:01 theelderbeever

@theelderbeever Ah, ok. Piccolo QueryString uses curly braces as placeholders, so it's seeing {"hello": "world"} as a placeholder instead of an actual value.

Internally Querystring uses Formatter.parse (docs) to deconstruct the string:

Formatter().parse('select * from foo where a = {}')

I can't figure out a way of making it ignore some curly braces. It seems like neither \{ or {{ works, so not sure what the solution is, besides replacing Formatter().parse with a different parser.

dantownsend avatar Jan 31 '22 21:01 dantownsend

Yeah I see what you mean... Not a lot of options there. Shame there isn't some combination of Template and Formatter.

Expounding on the above example when providing multiple values where an int or otherwise follows a jsonb the prepare statement fails in a strange way.

/asyncpg/protocol/prepared_stmt.pyx:197, in asyncpg.protocol.protocol.PreparedStatementState._encode_bind_msg()

DataError: invalid input for query argument $3: 10 (expected str, got int)
await Employees.raw(
    """
    WITH temp as (
        SELECT name, jsonb_data, salary
        FROM (
            VALUES 
                ({}, {}, {})
        ) t (name, jsonb_data, salary)
    )
    
    UPDATE employees e
    SET
    	jsonb_data = t.jsonb_data,
    	salary = t.salary
    WHERE
    	e.name = t.name
    """,
    'Bob', '{"hello": "world"}', 100_000
)

Strangely, the emitted SQL can be pasted into another client and works fine.

This specifically occurs when the values get passed through the QueryString.

theelderbeever avatar Jan 31 '22 22:01 theelderbeever

@theelderbeever I'm trying to recreate that DataError issue. I made this simple script, but am getting a syntax error for the SQL - can you spot the issue?

from piccolo.table import Table
from piccolo.columns.column_types import Varchar, JSONB, Integer
from piccolo.engine.postgres import PostgresEngine


DB = PostgresEngine({"database": "jsonb_debug"})


class Employees(Table, db=DB):
    name = Varchar()
    jsonb_data = JSONB()
    salary = Integer()


def main():
    Employees.create_table(if_not_exists=True).run_sync()

    Employees.raw(
        """
        WITH temp as (
            SELECT name, jsonb_data, salary
            FROM (
                VALUES
                    ({}, {}, {})
            ) t (name, jsonb_data, salary)
        )

        UPDATE employees e
        SET
            jsonb_data = t.jsonb_data,
            salary = t.salary
        WHERE
            e.name = t.name
        """,
        "Bob",
        '{"hello": "world"}',
        100_000,
    ).run_sync()


if __name__ == '__main__':
    main()

I'll experiment a bit more.

dantownsend avatar Feb 01 '22 22:02 dantownsend

Ah whoops copy-paste-edit error when creating a non project specific example....

This should do the trick. Two edits...

  1. Added FROM temp t after the SET clause.
  2. Also realized that the jsonb column has to be type cast coming from the temp table. So added t.jsonb_data::jsonb
      WITH temp as (
          SELECT name, jsonb_data, salary
          FROM (
              VALUES
                  ({}, {}, {})
          ) t (name, jsonb_data, salary)
      )

      UPDATE employees e
      SET
          jsonb_data = t.jsonb_data::jsonb,
          salary = t.salary
      FROM temp t
      WHERE
          e.name = t.name

theelderbeever avatar Feb 01 '22 23:02 theelderbeever

@theelderbeever I had a look into it a bit more, and I could replicate the error:

# Fails
await Employees.raw(
    """
    WITH temp as (
        SELECT name, jsonb_data, salary
        FROM (
            VALUES
                ({}, {}, {})
        ) t (name, jsonb_data, salary)
    )

    UPDATE employees e
    SET
        jsonb_data = t.jsonb_data::jsonb,
        salary = t.salary
    FROM temp t
    WHERE
        e.name = t.name
    """,
    "Bob",
    '{"hello": "world"}',
    100_000,
)

The error is:

asyncpg.exceptions.DatatypeMismatchError: column "salary" is of type integer but expression is of type text
HINT:  You will need to rewrite or cast the expression.

It seems like the issue is with asyncpg. When I directly passed the query to asyncpg, bypassing Piccolo entirely, I got the same error:

# Fails
await DB._run_in_new_connection(
    """
    WITH temp as (
        SELECT name, jsonb_data, salary
        FROM (
            VALUES
                ($1, $2, $3)
        ) t (name, jsonb_data, salary)
    )

    UPDATE employees e
    SET
        jsonb_data = t.jsonb_data::jsonb,
        salary = t.salary
    FROM temp t
    WHERE
        e.name = t.name
    """,
    [
        "Bob",
        '{"hello": "world"}',
        100_000
    ]
)

How about directly doing an update, rather than using WITH?

await DB._run_in_new_connection(
    """
    UPDATE employees e
    SET
        jsonb_data = $2,
        salary = $3
    WHERE
        e.name = $1
    """,
    [
        "Bob",
        '{"hello": "world"}',
        100_000
    ]
)

I'll create an issue in asyncpg.

dantownsend avatar Feb 02 '22 13:02 dantownsend

@theelderbeever I think I know the issue now.

If you do this, then it works OK:

    await Employees.raw(
        """
        WITH temp as (
            SELECT name, jsonb_data, salary
            FROM (
                VALUES
                    ({}, {}, {}::integer)    --         <------- here
            ) t (name, jsonb_data, salary)
        )

        UPDATE employees e
        SET
            jsonb_data = t.jsonb_data::jsonb,
            salary = t.salary
        FROM temp t
        WHERE
            e.name = t.name
        """,
        "Bob",
        '{"hello": "world"}',
        100_000,
    )

Postgres must assume that each value coming back from the WITH statement will be a string, which is why it complains when we're passing an integer.

dantownsend avatar Feb 02 '22 14:02 dantownsend

How about directly doing an update, rather than using WITH?

Part of the reason I haven't been doing a direct approach is because the real application is a bulk update so the WHERE e.name = t.name is needed along with the CTE (or subquery).

Just discovered though that the VALUES list causes the bad typing not the CTE. And applying the casting in the SELECT seems to be the smoothest.

    await Employees.raw(
        """
        WITH temp as (
            SELECT name, jsonb_data::jsonb, salary::integer    --         <------- here
            FROM (
                VALUES
                    ({}, {}, {}),
                    ({}, {}, {})
            ) t (name, jsonb_data, salary)
        )

        UPDATE employees e
        SET
            jsonb_data = t.jsonb_data,
            salary = t.salary
        FROM temp t
        WHERE
            e.name = t.name
        """,
        "Bob", '{"hello": "world"}', 100_000,
        "Mary", '{"hello": "mars"}', 1_000_000
    )

Thanks for all the work on this btw! Been a little deep in the weeds on this one.

theelderbeever avatar Feb 02 '22 15:02 theelderbeever

@theelderbeever No worries - it was a good learning experience.

dantownsend avatar Feb 02 '22 16:02 dantownsend

FWIW would it be valuable to have the querystring automatically attempt to json.dump lists and dictionaries when doing something like the above? Right now it just directly inserts the value and that tends to fail the sql. Which means that manually dumping the values has to be done.

Edit: it appears asyncpg wants absolutely everything to be a string... UUID, list/dict, and even numerics.

theelderbeever avatar Feb 02 '22 22:02 theelderbeever