duckdb-node icon indicating copy to clipboard operation
duckdb-node copied to clipboard

DuckDB package size

Open sean-legitscript opened this issue 1 year ago • 4 comments

Hey all, wanting to use duckdb for out parquet parsing needs. In our lambda functions. I ran npm install duckdb and it installed without issue. I am also able to successfully parse my parquet files. The problem comes when trying to deploy my lambda stack. When running a deployment, I get the error:

Resource handler returned message: "Unzipped size must be smaller than 262144000 bytes (Service: Lambda, Status Code: 400, Request ID: XXX)" (RequestToken: XXX, HandlerErrorCode: InvalidRequest)

When running du node_modules/duckdb, I can see that the package is 284400 KB, so 284.4 MB. This is way too big for any lambda to deploy with serverless. Is this the expected size of the duckdb package? If so, are there workarounds for this package size that duckdb can support?

sean-legitscript avatar Mar 29 '24 20:03 sean-legitscript

I spawned a EC2 instance:

[ec2-user@ip-172-31-91-131 ~]$ sudo yum install npm
[ec2-user@ip-172-31-91-131 ~]$ npm install duckdb
[ec2-user@ip-172-31-91-131 ~]$ du -sh node_modules/duckdb/
113M	node_modules/duckdb/

of

[ec2-user@ip-172-31-91-131 ~]$ du -sh node_modules/duckdb/*
4.0K	node_modules/duckdb/LICENSE
4.0K	node_modules/duckdb/Makefile
4.0K	node_modules/duckdb/README.md
24K	node_modules/duckdb/binding.gyp
4.0K	node_modules/duckdb/binding.gyp.in
4.0K	node_modules/duckdb/duckdb.js
53M	node_modules/duckdb/lib
4.0K	node_modules/duckdb/package.json
16K	node_modules/duckdb/scripts
60M	node_modules/duckdb/src
712K	node_modules/duckdb/test
4.0K	node_modules/duckdb/tsconfig.json
4.0K	node_modules/duckdb/vendor
8.0K	node_modules/duckdb/vendor.py

Of those the src folder is optional, can be removed and package will still be functional.

Can you share how did you got to 284.4 MB? Possibly building from source?

carlopi avatar Apr 02 '24 10:04 carlopi

@sean-legitscript you can try to use the DuckDB Lambda Node Layer I maintain: https://github.com/tobilg/duckdb-nodejs-layer. Also, the "normal" DuckDB package should only work on Node 20 runtimes, because every runtime below uses Amazon Linux 2 which has GLIBC incompatibilities with the pre-compiled packages...

tobilg avatar Apr 17 '24 15:04 tobilg

@carlopi I think the src/ and test/ directories could be removed before publishing (e.g. via .npmignore), right? They are not for the package to function IMO, only what's in lib/

tobilg avatar Apr 17 '24 15:04 tobilg

Any updates eventually regarding my last comment @carlopi? Thanks!

tobilg avatar May 29 '24 12:05 tobilg