swarm
swarm copied to clipboard
Swarm.Tracker - fix `start_pid_remotely` retrying rapidly
Noticed a lot of this message in our logs:
remote tracker on #{remote_node} went down during registration, retrying operation..
It seems to happen randomly, but I think this will fix it
I've seen this also happen to us just today in our production cluster. There haven't been any deploys in like 3 weeks, and everything ran smooth until this happened.
When this happened we got ~11M logs in 2 hours of this retrying and not being able to fix itself. Restarted the pods and then everything got back to normal
We are running 2 pods on a k8s cluster. Lib is a dependency of https://github.com/commanded/commanded-swarm-registry
We are using swarm lib 3.4.0