arrow-flight-sql-postgresql icon indicating copy to clipboard operation
arrow-flight-sql-postgresql copied to clipboard

Improve `SELECT text` performance

Open kou opened this issue 2 years ago • 2 comments

It's slower than the PostgreSQL protocol.

The followings may be related but we need to look into it:

  • Building arrow::RecordBatch is slow?
    • We need to build arrow::RecordBatches to use arrow::ipc::RecordBatchWriter(). We need to copy PostgreSQL data for it. (Not zero-copy.)
    • Should we add an API that writes Apache Arrow streaming format data without building arrow::RecordBatch to Apache Arrow C++?
  • Calling SPI_getbinval() is slow?
    • It calls nocachegetattr() https://github.com/postgres/postgres/blob/3edc6580c0e27fb8f13322efd255a88d20dda6c2/src/backend/access/common/heaptuple.c#L496-L712 and it's not a short function. Can we shortcut some operations?

kou avatar Nov 20 '23 07:11 kou

I was wondering if there is an update on this for the performance comparison. And whether any research had gone in to whether this might work with https://github.com/timescale/timescaledb.

davidhcoe avatar Mar 22 '24 21:03 davidhcoe

I have some more ideas (e.g. internal naive ring buffer implementation may be bottleneck) but I haven't worked on this so much yet.

I haven't tried this with TimescaleDB but it will work because this doesn't care about executed SELECT.

kou avatar Mar 22 '24 22:03 kou