Returning large number of rows in SimpleQueryHandler::do_query() is extremely slow
Hi there. I've been experimenting with pgwire. Using psql when returning a large number of rows (even with a single i32 column) from SimpleQueryHandler::do_query() the results are almost 10x slower than in postgres.
- Example, returning 7.5M rows of a row containing a single i32 column in pgwire using TextDataRowEncoder takes approx 6.5 secs in psql
- The same results from Postgres comes back in 600ms.
version : 0.7.0 OS: MacOS M1
Thank you for reporting. I haven't got chance to work on performance of pgwire. I will do some profile to find out the bottleneck. Contribution is welcomed if you are interested in this part.
In #94 I'm adding tcp_nodelay to client socket by default, according to my pgbench test the performance is now on par with postgresql.
Oh thats really cool.
I was looking at the again. It still seems pretty slow. When executing a query with alot of results (millions) on postgres directly it seems that some values are returned to the client before the query has finished executing (so it gives the impression that its quicker), however when using pgwire, all results are returned before any results are shown. Is this the case?
The implementation of streaming to the client I'm using is identical to the datafusion example.
Also in GrepTime, have you guys done performance testing with large resultsets?
Sorry for late response. I have been super busy these days. At greptime we haven't cover this part on postgres interface.
I'm going to check the code again but iirc we are using a stream based API to return results to client. It seems my datafusion example has some potential improvement that for a recordbatch, we don't need to add all results in the vector. This might be the reason you it's blocking for results. I will find time to update the example.
I just improved performance of DataRowEncoder in #165 and it should have twice throughtput in some cases.