trino-python-client icon indicating copy to clipboard operation
trino-python-client copied to clipboard

Execute prepared statements on client side by default

Open damian3031 opened this issue 3 years ago • 2 comments

Draft PR to discuss approach for #306

damian3031 avatar Jan 05 '23 07:01 damian3031

Just to complete the information for @hashhar.

We discussed the following approaches

  1. We could also avoid the prepare and deallocation query by simply adding the prepared statement directly on the ClientSession. The advantage is that we can still make use of the query parser on the server side. The disadvantage is that we still use the headers as transport.

  2. Do the parsing on the clientside. then we have to take care of values and identifiers within the query that contain ?. eg. cur.execute("SELECT '?''?', ?", 'test') or cur.execute('select 1 as "?""test", ?', 'test')

  3. Transport the prepared statement parameters in the JSON body instead of the headers as a backwards compatible API change, that can be used by all clients instead of just the trino python client

mdesmet avatar Jan 05 '23 08:01 mdesmet

Doing on the client side makes the most sense to me depending on what problem we're trying to solve.

1 (directly manipulate ClientSession) seems like a hack. 2 (client side) solves the problem of multiple queries but introduces security concerns. 3 (new backward compatible API) solves the limitation that prepared statements cannot be very large but it doesn't help the fact that multiple queries need to be sent anyway.

I think we can do the following:

  • Make paramstyle settable and only allow it to be set to either format or qmark with qmark as default.
  • When we create Connection we "freeze" the paramstyle value i.e. a Connection's paramstyle is constant.
  • paramstyle = format would do client side bindings, paramstyle = qmark would do server side (what is already done today)
  • For paramstyle = format it's easier to do client side binding since you no longer need to worry about "finding" all markers and can just use str % values from Python to do the value setting.

For paramstyle = format we need to:

  • escape all params only - query text needs no escaping - if something needs escaping there (%) then the user should do that - this is what all Python drivers do (I checked psycopg2, pymysql, Oracle).
  • Do query % escaped_values and execute the query.

Where the handling of qmark and format style needs to be done? - to me it seems the logical place is to have two Cursor classes and override their execute and executemany methods. Alternatively we can keep existing cursor and introduce two methods process_qmark_params and process_format_params and call them from execute and executemany.

The question we need to answer

What's the biggest problem we want to solve?

  1. multiple queries sent when using params?
  2. the size limitation of prepared statements?

If we want to solve 1 then the only solution is what we discuss above. If we want to solve 2 then the API change that @mdesmet mentioned in his comment is a good solution.

To be honest I don't know which of these problems people actually care about.

The benefit of customizable paramstyle is that we can solve both problems at once but at the cost of some additional code and complexity (both for users and for us).

hashhar avatar Jan 06 '23 12:01 hashhar