Failed to get auth credentials: too many open files
See the GitHub issue for further discussion: https://github.com/pixie-io/pixie/issues/312
Frequently, when running auth.login, we get the error of "too many open files". There is a fix which is to use ulimit to increase the number of open files. However, given the frequency with which we and our users hit this, we should investigate if we can avoid triggering it in the first place.
@JamesMBartlett was able to track number of open files with the following bpftrace probe:
sudo bpftrace -e 'kretprobe:alloc_fd /comm == "px"/{ printf("%s %d\n", comm, retval); }'```
I also can't get px deploy to finish running. Always crashes after waiting for health checks.
Tried to fix with setting ulimit to 10240 and deleting auth.json - still without success.
@scomri I just faced the same issue. It turns out that I had some amqp traffic on my cluster, which, thanks to a bug, causes the pem (Pixie edge module) to crash. The following worked for me:
px deploy --pem_flags stirling_enable_amqp_tracing=0
We have a bug fix for this crash: #946. It will be in the next release.
However, I would also offer that our px deploy command should fail more gracefully. i.e., "too many open files" is not a useful message in this case (and indicative of some other problem in the px binary, i.e. rather than fail with too many open files, px should not fail and should say something informative).