CSV bulk insert performance
Environment
- PostgreSQL version: Docker
f0b44c71fee9postgres:9.6 - PostgREST version: Docker
3153f5b6d430postgrest/postgrest - Operating system: Linux
Description of issue
Not an issue per se, however, I am getting poor performance when uploading a relatively small CSV file
I am essentially uploading time series data.
Below is the relevant excert of the db schema.
create table consumption_shape
(
id serial not null
constraint consumption_shape_pkey
primary key,
name text,
filename text,
owner_id integer default request.user_id()
constraint consumption_shape_owner_id_fkey
references user
)
;
create table shape_value
(
id serial not null
constraint shape_value_pkey
primary key,
time_ending timestamp not null,
mwh double precision not null,
consumption_shape integer not null
constraint shape_value_consumption_shape_fkey
references consumption_shape
)
;
CREATE VIEW shape_values AS SELECT shape_value.id,
shape_value.time_ending,
shape_value.mwh,
shape_value.consumption_shape
FROM data.shape_value;
;
The key table is the shape_value table.
Test
$ time curl 'http://172.17.0.1:3000/shape_values' -H 'Content-Type: text/csv' -H 'Accept: application/json' --data-binary @example.csv -D -
HTTP/1.1 100 Continue
HTTP/1.1 201 Created
Transfer-Encoding: chunked
Date: Sat, 03 Nov 2018 05:48:56 GMT
Server: postgrest/5.1.0 (3cef4b7)
Content-Range: */*
real 0m18.528s
I have uplaoded the csv file I am uploading as a gist
I have noticed in the db logs, it seems Postgres is converting the csv to json before inserting, is this the expected behaviour.
Any help would be greatly appreciated, thanks!
I imagine I am sending the request incorrectly.
Yes, we convert the csv to json and then insert to postgres, this is because currently our pg lib doesn't support the pg COPY command(see https://github.com/nikita-volkov/hasql/issues/1), once that's supported we could insert the csv in one go with no additional processing.
Bulk inserting JSON could also be made a lot faster, see https://github.com/PostgREST/postgrest/issues/690.
Curious if the Hasql support is still a blocker for this feature?
@nathancahill No, not really. In fact I don't see us using COPY now since it'll prevent using libpq pipeline mode (https://github.com/PostgREST/postgrest/pull/2707).
I think the solution for this is https://github.com/PostgREST/postgrest/issues/2826, which will provide an escape hatch for doing your own csv parsing in SQL (or any pg language, like plrust) which will have higher perf. I expect to work on it for the next release.