usql High memory usage

I run simple query select * from <tablen_name> where date_column >= '<date>';. The result for this query is 1854211 rows, after I got the first page of results the usql process consumed 6GB of the memory.

obraz

Sep 12 '22 16:09 kozikowskik

I can confirm that running this query on PostgreSQL 13 uses about 1GB of memory, no matter if using a pager or not, or running in batch mode:

./usql 'postgres://postgres:pw@localhost:49153?sslmode=disable' -c "select 'aaaaaaaaaaaaaa' as a, 'bbbbbbbbbbbbbb' as b, i from generate_series(1,1854211) i;"

Sep 12 '22 20:09 nineinchnick

pprof001

Sep 12 '22 20:09 nineinchnick

The allocation itself should not be that big of an issue, but I'd expect the memory usage to drop once the query is complete and the results have been processed.

Sep 12 '22 20:09 nineinchnick

Huh, I somehow thought the TableEncoder.count has a non-zero default, but it's actually not used at all. Maybe we'd want to set it to something like 100k?

Sep 12 '22 21:09 nineinchnick

Also, looks like I have no idea how the Go's GC is supposed to work. Calling runtime.GC() explicitly after executing the query drops the memory usage from under 1GB to 3MB.

Sep 12 '22 21:09 nineinchnick

Using GNU time (not the built in time alias):

$ alias time="$(which time) -f '\t%E real,\t%U user,\t%S sys,\t%K amem,\t%M mmem'"
$ time psql -c "select 'aaaaaaaaaaaaaa' as a, 'bbbbbbbbbbbbbb' as b, i from generate_series(1,1854211) i;" postgres://postgres:P4ssw0rd@localhost/ > /dev/null
	0:02.62 real,	2.07 user,	0.10 sys,	0 amem,	231272 mmem
$ time usql -c "select 'aaaaaaaaaaaaaa' as a, 'bbbbbbbbbbbbbb' as b, i from generate_series(1,1854211) i;" postgres://postgres:P4ssw0rd@localhost/ > /dev/null
	0:03.37 real,	6.34 user,	0.25 sys,	0 amem,	1068812 mmem

It would appear that usql uses roughly 5x the memory, and takes 3x as long to run. I'll try to see if there are ways to reduce this memory usage, but I find it unlikely.

Sep 12 '22 21:09 kenshaw

See this proof-of-concept, where I enable batching (100k): https://github.com/xo/usql/compare/master...nineinchnick:usql:print-batch

% alias time="gtime -f '\t%E real,\t%U user,\t%S sys,\t%K amem,\t%M mmem'"
% time ./usql -c "select 'aaaaaaaaaaaaaa' as a, 'bbbbbbbbbbbbbb' as b, i from generate_series(1,1854211) i;" 'postgres://postgres:pw@localhost/?sslmode=disable' > /dev/null
	0:04.83 real,	4.76 user,	0.23 sys,	0 amem,	121280 mmem
% time psql -c "select 'aaaaaaaaaaaaaa' as a, 'bbbbbbbbbbbbbb' as b, i from generate_series(1,1854211) i;" 'postgres://postgres:pw@localhost/?sslmode=disable' > /dev/null  
	0:04.48 real,	2.75 user,	0.12 sys,	0 amem,	233536 mmem
% time usql -c "select 'aaaaaaaaaaaaaa' as a, 'bbbbbbbbbbbbbb' as b, i from generate_series(1,1854211) i;" 'postgres://postgres:pw@localhost/?sslmode=disable' > /dev/null  
	0:04.74 real,	5.07 user,	0.30 sys,	0 amem,	1219200 mmem
% usql --version
usql 0.12.13
% psql --version
psql (PostgreSQL) 14.5 (Homebrew)

The drawback is that the width of the table (and columns) can change every 100k rows. I wouldn't enable this by default but maybe allow setting an option? But we could add calling runtime.GC() after done executing the query to reduce the memory usage if a user just forgot to add a LIMIT to the query and actually doesn't need this many results :-)

Also I could rebase https://github.com/xo/tblfmt/pull/18

Sep 13 '22 07:09 nineinchnick