Why query parsing is mandatory?
Hi.
What is the rationale for making SQL parsing mandatory? Event though parser is lightweight, it definitely adds latency especially in case of lengthy SQL, Adds up to ~100ms for each invocation of a complex query in my case
It would be great if there was a flag that could turn SQL parsing off (as it was in previous versions).
Thanks
What is the rationale for making SQL parsing mandatory?
In the beginning, the parser was introduced to address two issues:
- stability issue - it helps to identify if a query is idempotent, and is safe to retry when there's "failed to respond" error.
- parsing issues that are hard to fix by regular expression
And later it's used for features like multi-statement query support, SQL rewrite and client-side prepared statement. Since then, the parser became mandatory so I removed the option.
It would be great if there was a flag that could turn SQL parsing off (as it was in previous versions).
Most JDBC drivers don't parse the whole SQL statement as they count on server, so yes I agree it's better to keep the option. However, if the major concern is performance and/or customizing the behavior, wouldn't it be better to use a non-JDBC async client? FYI, in the upcoming clickhouse-client and clickhouse-grpc-client, parsing is disabled by default(you can still turn it on as needed), and you have full control there :)
I think that the client should be as lightweight as possible.
Parsing SQL on client side can bring you some cool features, I agree with that, but since it comes with performance penalty it should be optional. Maybe have something like "extended API" for reads (just like you have for writes) where caller can turn parsing off and specify explicitly whether his query is idempotent or not, etc?
FYI, in the upcoming
clickhouse-clientandclickhouse-grpc-client, parsing is disabled by default(you can still turn it on as needed), and you have full control there :)
Great, waiting for it :)
I think that the client should be as lightweight as possible.
I agree with you. That's one of the reasons why we want to split the driver into multiple modules.
Maybe have something like "extended API" for reads (just like you have for writes) where caller can turn parsing off and specify explicitly whether his query is idempotent or not, etc?
Extended API will be replaced by the new client gradually, but we can definitely add an option there for enabling/disabling it as needed(disabled by default for consistency).
But why CH parses and executes SQL queries faster than jdbc parses?
But why CH parses and executes SQL queries faster than jdbc parses?
Language differences aside, the parser in JDBC is not optimized. The workaround I added to skip non-interested parts(e.g. with-statement, sub-queries, inline arrays/tuples, and function arguments etc. via anyExpr) increased loops and generated more small objects. It became an issue when dealing with large SQL like discussed in #615, and it failed dealing with insert into ... format <format>\n<raw data> as shown in #652(the parse should skip raw data).
Besides optimizing the parser, while I'm working on the new client, in order to support advanced data types like Array, Nested and Tuple, I'm trying a different approach by avoiding serious parsing and regular expression, which I believe should perform better.
Update:
In the new Java client, there are a few methods being added into ClickHouseUtils for parsing, for examples: skipBrackets and skipContentsUntil etc. Actually named parameter is built on top of that. I still need time to add more tests and tidy the code, but I think you get the ideas.
As to the JavaCC parser, it's been removed from clickhouse-client(core module) and there's no option to turn "parsing" on/off. However, I think it's better to keep it in clickhouse-jdbc(the JDBC driver), as it's still useful to extract database and table names etc., but I'll see if I can add the option back.