DuckDB package size
Hey all, wanting to use duckdb for out parquet parsing needs. In our lambda functions. I ran npm install duckdb and it installed without issue. I am also able to successfully parse my parquet files. The problem comes when trying to deploy my lambda stack. When running a deployment, I get the error:
Resource handler returned message: "Unzipped size must be smaller than 262144000 bytes (Service: Lambda, Status Code: 400, Request ID: XXX)" (RequestToken: XXX, HandlerErrorCode: InvalidRequest)
When running du node_modules/duckdb, I can see that the package is 284400 KB, so 284.4 MB. This is way too big for any lambda to deploy with serverless. Is this the expected size of the duckdb package? If so, are there workarounds for this package size that duckdb can support?
I spawned a EC2 instance:
[ec2-user@ip-172-31-91-131 ~]$ sudo yum install npm
[ec2-user@ip-172-31-91-131 ~]$ npm install duckdb
[ec2-user@ip-172-31-91-131 ~]$ du -sh node_modules/duckdb/
113M node_modules/duckdb/
of
[ec2-user@ip-172-31-91-131 ~]$ du -sh node_modules/duckdb/*
4.0K node_modules/duckdb/LICENSE
4.0K node_modules/duckdb/Makefile
4.0K node_modules/duckdb/README.md
24K node_modules/duckdb/binding.gyp
4.0K node_modules/duckdb/binding.gyp.in
4.0K node_modules/duckdb/duckdb.js
53M node_modules/duckdb/lib
4.0K node_modules/duckdb/package.json
16K node_modules/duckdb/scripts
60M node_modules/duckdb/src
712K node_modules/duckdb/test
4.0K node_modules/duckdb/tsconfig.json
4.0K node_modules/duckdb/vendor
8.0K node_modules/duckdb/vendor.py
Of those the src folder is optional, can be removed and package will still be functional.
Can you share how did you got to 284.4 MB? Possibly building from source?
@sean-legitscript you can try to use the DuckDB Lambda Node Layer I maintain: https://github.com/tobilg/duckdb-nodejs-layer. Also, the "normal" DuckDB package should only work on Node 20 runtimes, because every runtime below uses Amazon Linux 2 which has GLIBC incompatibilities with the pre-compiled packages...
@carlopi I think the src/ and test/ directories could be removed before publishing (e.g. via .npmignore), right? They are not for the package to function IMO, only what's in lib/
Any updates eventually regarding my last comment @carlopi? Thanks!