pigeon icon indicating copy to clipboard operation
pigeon copied to clipboard

network failure on init causes application crash

Open mccraigmccraig opened this issue 9 months ago • 5 comments

Environment

  • Elixir & Erlang/OTP versions (elixir --version):
    • elixir 1.18.3-otp-27
    • erlang 27.3.3
  • Operating system:
    • MacOS & Ubuntu
  • Pigeon version:
    • pigeon 2.0.1

Current behavior

When our application is started without a network (e.g. on a laptop with no active connection) then the Pigeon FCM and APNS supervisors crash and cause the application to fail

This isn't great for on-the-move development, but it's easy to workaround in a local environment

More concerning is that our application startup is now dependent on Pigeon successfully initiating connections to Firebase and APNS - so I can imagine a situation where some 3rd party network failure renders our application pods unable to start, and because of K8S load-balancing scaling up and down the whole API will soon die

Here's the error I'm seeing on application startup:

** (Mix) Could not start application backend: Backend.Application.start(:normal, []) returned an error: shutdown: failed to start child: Backend.PushNotifications.FCM
    ** (EXIT) shutdown: failed to start child: 1
        ** (EXIT) {{{:badmatch, {:error, {{:badmatch, {:error, :nxdomain}}, [{Kadabra.Connection, :init, 1, [file: ~c"lib/connection.ex", line: 44]}, {:gen_server, :init_it, 2, [file: ~c"gen_server.erl", line: 2229]}, {:gen_server, :init_it, 6, [file: ~c"gen_server.erl", line: 2184]}, {:proc_lib, :init_p_do_apply, 3, [file: ~c"proc_lib.erl", line: 329]}]}}}, [{Kadabra.ConnectionPool, :init, 1, [file: ~c"lib/connection_pool.ex", line: 59]}, {:gen_server, :init_it, 2, [file: ~c"gen_server.erl", line: 2229]}, {:gen_server, :init_it, 6, [file: ~c"gen_server.erl", line: 2184]}, {:proc_lib, :init_p_do_apply, 3, [file: ~c"proc_lib.erl", line: 329]}]}, {:child, :undefined, #Reference<0.1652344825.142606341.32331>, {Kadabra.ConnectionPool, :start_link, [%URI{scheme: "https", authority: "fcm.googleapis.com", userinfo: nil, host: "fcm.googleapis.com", port: 443, path: nil, query: nil, fragment: nil}, #PID<0.1309.0>, [ssl: [{:active, :once}, {:packet, :raw}, {:reuseaddr, true}, {:alpn_advertised_protocols, ["h2"]}, :binary]]]}, :transient, false, 5000, :worker, [Kadabra.ConnectionPool]}}

Expected behavior

If configuration is good, then a failure to open a connection to Firebase or APNS should not cause a failure of the Pigeon supervisors

mccraigmccraig avatar May 09 '25 16:05 mccraigmccraig

Thanks for bringing this to my attention. I'm addressing it in the next release. It's similar to the nasty errors that crop up when an APNS certificate expires.

Unfortunately all of this is related to some deeper issues with Kadabra, so it'll take some work to revise the internals. However, the end result should be very similar to how db_connection behaves when a database is offline.

hpopp avatar May 20 '25 17:05 hpopp

Thanks for bringing this to my attention. I'm addressing it in the next release. It's similar to the nasty errors that crop up when an APNS certificate expires.

This is good to know. We were planning on using a custom supervisor with exponential backoff on restarts. But if we can count on Kadabra not getting into a restart loop fast enough to kill the supervisor, that's all we need.

mikebveil avatar May 27 '25 17:05 mikebveil

I'm also seeing this error (even though the computer has an active internet connection) but only on my Mac. It's working fine on an Ubuntu server. The difference that I see is the OTP version. I noticed the issue right after I updated that on my Mac. It might be related or not 🤷🏻‍♂️

Both are running pigeon 2.0.1

Ubuntu 22.04.5 LTS Erlang/OTP 27 [erts-15.2.2] [source] [64-bit] [smp:2:2] [ds:2:2:10] [async-threads:1] [jit] Elixir 1.18.3 (compiled with Erlang/OTP 27)

macOS 15.5 Erlang/OTP 28 [erts-16.0] [source] [64-bit] [smp:12:12] [ds:12:12:10] [async-threads:1] [jit] [dtrace] Elixir 1.18.4 (compiled with Erlang/OTP 27)

dma-kingdom avatar May 30 '25 09:05 dma-kingdom

running into the same issue on OTP 28 on my MacBook. it worked fine under elixir 1.17.4 + OTP 25. Not sure how I can provide more info on this. has anybody fixed this issue? This actively prevents me from updating to elixir 18.x. running pigeon 2.0.1


Solution: I Solved it by pinning the master branch in my mix.exs file. The master uses the mint adapter instead of kadabra.

aspala avatar Sep 17 '25 16:09 aspala