postgrest icon indicating copy to clipboard operation
postgrest copied to clipboard

CSV bulk insert performance

Open jamesmstone opened this issue 7 years ago • 3 comments

Environment

  • PostgreSQL version: Docker f0b44c71fee9 postgres:9.6
  • PostgREST version: Docker 3153f5b6d430 postgrest/postgrest
  • Operating system: Linux

Description of issue

Not an issue per se, however, I am getting poor performance when uploading a relatively small CSV file I am essentially uploading time series data.

Below is the relevant excert of the db schema.

create table consumption_shape
(
	id serial not null
		constraint consumption_shape_pkey
			primary key,
	name text,
	filename text,
	owner_id integer default request.user_id()
		constraint consumption_shape_owner_id_fkey
			references user
)
;



create table shape_value
(
	id serial not null
		constraint shape_value_pkey
			primary key,
	time_ending timestamp not null,
	mwh double precision not null,
	consumption_shape integer not null
		constraint shape_value_consumption_shape_fkey
			references consumption_shape
)
;



CREATE VIEW shape_values AS SELECT shape_value.id,
    shape_value.time_ending,
    shape_value.mwh,
    shape_value.consumption_shape
   FROM data.shape_value;
;

The key table is the shape_value table.

Test

$ time curl 'http://172.17.0.1:3000/shape_values' -H 'Content-Type: text/csv' -H 'Accept: application/json' --data-binary @example.csv -D -
HTTP/1.1 100 Continue

HTTP/1.1 201 Created
Transfer-Encoding: chunked
Date: Sat, 03 Nov 2018 05:48:56 GMT
Server: postgrest/5.1.0 (3cef4b7)
Content-Range: */*


real	0m18.528s

I have uplaoded the csv file I am uploading as a gist

I have noticed in the db logs, it seems Postgres is converting the csv to json before inserting, is this the expected behaviour.

Any help would be greatly appreciated, thanks!

I imagine I am sending the request incorrectly.

jamesmstone avatar Nov 03 '18 06:11 jamesmstone

Yes, we convert the csv to json and then insert to postgres, this is because currently our pg lib doesn't support the pg COPY command(see https://github.com/nikita-volkov/hasql/issues/1), once that's supported we could insert the csv in one go with no additional processing.

Bulk inserting JSON could also be made a lot faster, see https://github.com/PostgREST/postgrest/issues/690.

steve-chavez avatar Nov 06 '18 16:11 steve-chavez

Curious if the Hasql support is still a blocker for this feature?

nathancahill avatar Aug 12 '23 20:08 nathancahill

@nathancahill No, not really. In fact I don't see us using COPY now since it'll prevent using libpq pipeline mode (https://github.com/PostgREST/postgrest/pull/2707).

I think the solution for this is https://github.com/PostgREST/postgrest/issues/2826, which will provide an escape hatch for doing your own csv parsing in SQL (or any pg language, like plrust) which will have higher perf. I expect to work on it for the next release.

steve-chavez avatar Aug 14 '23 23:08 steve-chavez