Invalid host defined options error in http crawler when used with pkg post ESM edition
Which package is this bug report for? If unsure which one to select, leave blank
@crawlee/http (HttpCrawler)
Issue description
Error
makdi-test | DEBUG HttpCrawler:SessionPool: No 'persistStateKeyValueStoreId' options specified, this session pool's data has been saved in the KeyValueStore with the id: ce096038-12bd-4473-ab69-d0e0fc741228
makdi-test | INFO HttpCrawler: Starting the crawler.
makdi-test | DEBUG HttpCrawler:AutoscaledPool:Snapshotter: Setting max memory of this run to 1963 MB. Use the CRAWLEE_MEMORY_MBYTES or CRAWLEE_AVAILABLE_MEMORY_RATIO environment variable to override it.
makdi-test | DEBUG HttpCrawler:SessionPool: Created new Session - session_MrRYxu57S9
makdi-test | WARN HttpCrawler: Reclaiming failed request back to the list or queue. TypeError: Invalid host defined options
makdi-test | at HttpCrawler._requestFunction (/snapshot/makdi/node_modules/@crawlee/http/internals/http-crawler.js:423:13)
makdi-test | at /snapshot/makdi/node_modules/@crawlee/http/internals/http-crawler.js:366:94
makdi-test | at wrap (/snapshot/makdi/node_modules/@apify/timeout/index.js:52:27)
makdi-test | at /snapshot/makdi/node_modules/@apify/timeout/index.js:66:7
makdi-test | at AsyncLocalStorage.run (node:async_hooks:319:14)
makdi-test | at /snapshot/makdi/node_modules/@apify/timeout/index.js:65:13
makdi-test | at new Promise (<anonymous>)
makdi-test | at addTimeoutToPromise (/snapshot/makdi/node_modules/@apify/timeout/index.js:59:10)
makdi-test | at HttpCrawler._handleNavigation (/snapshot/makdi/node_modules/@crawlee/http/internals/http-crawler.js:366:76)
makdi-test | at async HttpCrawler._runRequestHandler (/snapshot/makdi/node_modules/@crawlee/http/internals/http-crawler.js:309:13) {"id":"3FULW9cbbkMrV4R","url":"https://crawlee.dev","retryCount":1}
makdi-test | DEBUG HttpCrawler:SessionPool: Created new Session - session_jF6xxRoKff
makdi-test | WARN HttpCrawler: Reclaiming failed request back to the list or queue. TypeError: Invalid host defined options
makdi-test | at HttpCrawler._requestFunction (/snapshot/makdi/node_modules/@crawlee/http/internals/http-crawler.js:423:13)
makdi-test | at /snapshot/makdi/node_modules/@crawlee/http/internals/http-crawler.js:366:94
makdi-test | at wrap (/snapshot/makdi/node_modules/@apify/timeout/index.js:52:27)
makdi-test | at /snapshot/makdi/node_modules/@apify/timeout/index.js:66:7
makdi-test | at AsyncLocalStorage.run (node:async_hooks:319:14)
makdi-test | at /snapshot/makdi/node_modules/@apify/timeout/index.js:65:13
makdi-test | at new Promise (<anonymous>)
makdi-test | at addTimeoutToPromise (/snapshot/makdi/node_modules/@apify/timeout/index.js:59:10)
makdi-test | at HttpCrawler._handleNavigation (/snapshot/makdi/node_modules/@crawlee/http/internals/http-crawler.js:366:76)
makdi-test | at async HttpCrawler._runRequestHandler (/snapshot/makdi/node_modules/@crawlee/http/internals/http-crawler.js:309:13) {"id":"3FULW9cbbkMrV4R","url":"https://crawlee.dev","retryCount":2}
makdi-test | DEBUG HttpCrawler:SessionPool: Created new Session - session_0RMbGUTi5D
makdi-test | WARN HttpCrawler: Reclaiming failed request back to the list or queue. TypeError: Invalid host defined options
makdi-test | at HttpCrawler._requestFunction (/snapshot/makdi/node_modules/@crawlee/http/internals/http-crawler.js:423:13)
makdi-test | at /snapshot/makdi/node_modules/@crawlee/http/internals/http-crawler.js:366:94
makdi-test | at wrap (/snapshot/makdi/node_modules/@apify/timeout/index.js:52:27)
makdi-test | at /snapshot/makdi/node_modules/@apify/timeout/index.js:66:7
makdi-test | at AsyncLocalStorage.run (node:async_hooks:319:14)
makdi-test | at /snapshot/makdi/node_modules/@apify/timeout/index.js:65:13
makdi-test | at new Promise (<anonymous>)
makdi-test | at addTimeoutToPromise (/snapshot/makdi/node_modules/@apify/timeout/index.js:59:10)
makdi-test | at HttpCrawler._handleNavigation (/snapshot/makdi/node_modules/@crawlee/http/internals/http-crawler.js:366:76)
makdi-test | at async HttpCrawler._runRequestHandler (/snapshot/makdi/node_modules/@crawlee/http/internals/http-crawler.js:309:13) {"id":"3FULW9cbbkMrV4R","url":"https://crawlee.dev","retryCount":3}
Works fine with [email protected] (before ESM migration), fails since [email protected]
Refer Dockerfile for reference https://github.com/teammakdi/makdi-test/blob/main/Dockerfile#L14
Code sample
https://github.com/teammakdi/makdi-test
Package version
3.6.2
Node.js version
18.17.0
Operating system
Ubuntu
Apify platform
- [ ] Tick me if you encountered this issue on the Apify platform
I have tested this on the next release
No response
Other context
No response
Hey, I'm unable to reproduce the issue you're having when running the code in your example repository. Could you confirm if you can consistently reproduce this issue? Does it happen in docker?
Hey @vladfrangu
Yes, it only fails via docker. Works fine if run directly on host machine via node.
Also in docker, it fails when used with https://github.com/vercel/pkg
You can checkout the test project and run it via docker
docker-compose up --build
Is there any reasons you use pkg in docker? 👀
Could you also try to reproduce this without pkg?
Hey @vladfrangu , it is reproducible only with pkg tool.
We have a requirement to convert our app to binary hence using the same.
It however works with [email protected], not with [email protected].
Hmm... I'll try to take a look and see if I can find out why it's breaking when ran via pkg, but I can't make any promises. 😅