cli icon indicating copy to clipboard operation
cli copied to clipboard

CLI Bug: Supabase does not start on certain networks - ** (ErlangError) Erlang error: {:erpc, :timeout}

Open kerren opened this issue 1 year ago • 1 comments

Describe the bug I am unable to start a fresh Supabase project on certain networks, it starts the database and as soon as that's up I get the message Stopping containers... and then I get error running container: exit 1.

Update: While busy writing out the debugging I decided to switch to my mobile data and that fixed it so this definitely has something to do with our firewall / internal network. I've added this report for anyone that may be experiencing the same problem. There is a workaround that I think everyone should be aware of.

To Reproduce Unfortunately, this doesn't seem to be reproducible by other people with the same CLI version. However, in my case, I would do the following:

Steps to reproduce the behavior:

  1. cd into my supabase folder
  2. supabase stop --no-backup
  3. rm -rf .temp
  4. supabase start --debug

Expected behavior I would expect the Supabase docker containers to start up correctly and I should be able to use it, instead, the project closes down.

Current Debugging I've added this section to outline some of the debugging I've already tried.

  • Downgrade the supabase cli to version 1.172.2 (based on a different closed github issue)
  • Use npx instead of homebrew for both 1.172.2 and 1.178.2
  • Try the latest beta version on homebrew through supabase/tap/supabase-beta
  • Downgrade my version of Docker from 26.1.4 to 25.0.5
  • Run supabase start -x realtime, what's interesting here is that the realtime container still started
  • Disabled my private DNS to ensure no resolvers were failing

Finally, I tried this:

  • Changed to my mobile network to ensure the firewall/network is not causing a problem - this worked!

Screenshots

I have logs from the realtime container

08:36:22.092 [debug] QUERY OK db=11.1ms
commit []
08:36:22.375 [notice]     :alarm_handler: {:set, {:system_memory_high_watermark, []}}
08:36:22.819 [info] Elixir.Realtime.SignalHandler is being initialized...
08:36:22.825 [notice] SYN[nonode@nohost] Adding node to scope <Elixir.Realtime.Tenants.Connect>
08:36:22.827 [notice] SYN[nonode@nohost] Creating tables for scope <Elixir.Realtime.Tenants.Connect>
08:36:22.832 [notice] SYN[nonode@nohost|registry<Elixir.Realtime.Tenants.Connect>] Discovering the cluster
08:36:22.834 [notice] SYN[nonode@nohost|pg<Elixir.Realtime.Tenants.Connect>] Discovering the cluster
08:36:22.834 [notice] SYN[nonode@nohost] Adding node to scope <users>
08:36:22.834 [notice] SYN[nonode@nohost] Creating tables for scope <users>
08:36:22.834 [notice] SYN[nonode@nohost|registry<users>] Discovering the cluster
08:36:22.835 [notice] SYN[nonode@nohost|pg<users>] Discovering the cluster
08:36:22.835 [notice] SYN[nonode@nohost] Adding node to scope <Elixir.RegionNodes>
08:36:22.835 [notice] SYN[nonode@nohost] Creating tables for scope <Elixir.RegionNodes>
08:36:22.835 [notice] SYN[nonode@nohost|registry<Elixir.RegionNodes>] Discovering the cluster
08:36:22.835 [notice] SYN[nonode@nohost|pg<Elixir.RegionNodes>] Discovering the cluster
08:36:22.837 [warning] Replica region not found, defaulting to Realtime.Repo
08:36:25.795 [debug] Tzdata polling for update.
08:36:27.314 [info] tzdata release in place is from a file last modified Fri, 22 Oct 2021 02:20:47 GMT. Release file on server was last modified Thu, 01 Feb 2024 18:40:48 GMT.
08:36:27.321 [debug] Tzdata downloading new data from https://data.iana.org/time-zones/tzdata-latest.tar.gz
08:36:28.188 [debug] Tzdata data downloaded. Release version 2024a.
08:36:28.525 [info] Tzdata has updated the release from 2021e to 2024a
08:36:28.525 [debug] Tzdata deleting ETS table for version 2021e
08:36:28.526 [debug] Tzdata deleting ETS table file for version 2021e
08:36:29.079 [info] Running RealtimeWeb.Endpoint with cowboy 2.10.0 at :::4000 (http)
08:36:29.082 [info] Access RealtimeWeb.Endpoint at http://realtime.fly.dev
08:36:29.104 [notice] SYN[nonode@nohost] Adding node to scope <Elixir.PostgresCdcStream>
08:36:29.104 [notice] SYN[nonode@nohost] Creating tables for scope <Elixir.PostgresCdcStream>
08:36:29.104 [notice] SYN[nonode@nohost|registry<Elixir.PostgresCdcStream>] Discovering the cluster
08:36:29.104 [notice] SYN[nonode@nohost|pg<Elixir.PostgresCdcStream>] Discovering the cluster
08:36:29.221 [notice] SYN[nonode@nohost] Adding node to scope <Elixir.Extensions.PostgresCdcRls>
08:36:29.221 [notice] SYN[nonode@nohost] Creating tables for scope <Elixir.Extensions.PostgresCdcRls>
08:36:29.221 [notice] SYN[nonode@nohost|registry<Elixir.Extensions.PostgresCdcRls>] Discovering the cluster
08:36:29.221 [notice] SYN[nonode@nohost|pg<Elixir.Extensions.PostgresCdcRls>] Discovering the cluster
08:36:29.231 [warning] Replica region not found, defaulting to Realtime.Repo
08:36:29.235 [debug] QUERY OK source="tenants" db=1.3ms queue=1.4ms idle=226.0ms
SELECT t0."id", t0."name", t0."external_id", t0."jwt_secret", t0."jwt_jwks", t0."postgres_cdc_default", t0."max_concurrent_users", t0."max_events_per_second", t0."max_bytes_per_second", t0."max_channels_per_client", t0."max_joins_per_second", t0."suspend", t0."enable_authorization", t0."inserted_at", t0."updated_at" FROM "tenants" AS t0 WHERE (t0."external_id" = $1) ["realtime-dev"]
08:36:29.251 [debug] QUERY OK source="extensions" db=0.8ms queue=1.3ms idle=239.1ms
SELECT e0."id", e0."type", e0."settings", e0."tenant_external_id", e0."inserted_at", e0."updated_at", e0."tenant_external_id" FROM "extensions" AS e0 WHERE (e0."tenant_external_id" = $1) ORDER BY e0."tenant_external_id" ["realtime-dev"]
08:36:29.265 [warning] Zero region nodes for iad using :nonode@nohost
08:36:29.271 project=realtime-dev external_id=realtime-dev [warning] Replica region not found, defaulting to Realtime.Repo
08:36:29.274 project=realtime-dev external_id=realtime-dev [debug] QUERY OK source="tenants" db=2.6ms idle=262.0ms
SELECT t0."id", t0."name", t0."external_id", t0."jwt_secret", t0."jwt_jwks", t0."postgres_cdc_default", t0."max_concurrent_users", t0."max_events_per_second", t0."max_bytes_per_second", t0."max_channels_per_client", t0."max_joins_per_second", t0."suspend", t0."enable_authorization", t0."inserted_at", t0."updated_at" FROM "tenants" AS t0 WHERE (t0."external_id" = $1) ["realtime-dev"]
08:36:29.276 project=realtime-dev external_id=realtime-dev [debug] QUERY OK source="extensions" db=1.4ms idle=257.4ms
SELECT e0."id", e0."type", e0."settings", e0."tenant_external_id", e0."inserted_at", e0."updated_at", e0."tenant_external_id" FROM "extensions" AS e0 WHERE (e0."tenant_external_id" = $1) ORDER BY e0."tenant_external_id" ["realtime-dev"]
** (ErlangError) Erlang error: {:erpc, :timeout}
    (kernel 8.5.4) erpc.erl:714: :erpc.call/5
    (stdlib 4.3) timer.erl:235: :timer.tc/1
    (realtime 2.28.32) lib/realtime/rpc.ex:37: Realtime.Rpc.enhanced_call/5
    (realtime 2.28.32) lib/realtime/tenants.ex:87: Realtime.Tenants.health_check/1
    nofile:1: (file)
    (stdlib 4.3) erl_eval.erl:748: :erl_eval.do_apply/7
[os_mon] cpu supervisor port (cpu_sup): Erlang has closed
[os_mon] memory supervisor port (memsup): Erlang has closed

System information Rerun the failing command with --create-ticket flag.

  • Ticket ID: 051b4388e7cc4a169259602462f6adb4
  • Version of OS: Ubuntu 22.04
  • Version of CLI: 1.172.2
  • Version of Docker: 26.1.4
  • Versions of services:
        SERVICE IMAGE      │      LOCAL       │ LINKED
  ─────────────────────────┼──────────────────┼─────────
    supabase/postgres      │ 15.1.1.61        │ -
    supabase/gotrue        │ v2.151.0         │ -
    postgrest/postgrest    │ v12.0.1          │ -
    supabase/realtime      │ v2.28.32         │ -
    supabase/storage-api   │ v1.0.6           │ -
    supabase/edge-runtime  │ v1.54.6          │ -
    supabase/studio        │ 20240422-5cf8f30 │ -
    supabase/postgres-meta │ v0.80.0          │ -
    supabase/logflare      │ 1.4.0            │ -
    supabase/supavisor     │ 1.1.56           │ -
    darthsim/imgproxy      │ v3.8.0           │ -

Additional context

I thought I'd post this here for anyone else experiencing the problem because, unfortunately, the logs aren't descriptive enough in the realtime container to isolate this issue. Maybe they need to output the IP address that they're trying to hit so that it can be debugged? My hunch right now is that this might have to do with the Supabase network that's automatically created. Perhaps there's a subnet clash? I'm really not sure.

But I do think, for anyone experiencing this, try switch your network and see if that resolves it. From what I can see, once you've started up the database and there is a backup to restore from, the health checks all work and then you can use your other network.

kerren avatar Jun 25 '24 09:06 kerren

The error seems to be pointing to realtime health check / migrations failing. Could you try again using latest CLI since we updated realtime? #2444

If the error persists, you can also try creating a custom docker network for cli to connect. See https://github.com/supabase/cli/issues/1397#issuecomment-2153837111

sweatybridge avatar Jun 27 '24 09:06 sweatybridge