github-action Tailscale step runs successfully but subsequent steps to connect to DB fail

We created the correct tags and set the scope to device. The step for Tailscale runs(i dont see any confirmations that we are connected) but the step to run my tests fail with ERROR tests/mycode/code/test_my_code.py - sqlalchemy.exc.OperationalError: (pymysql.err.OperationalError) (2003, "Can't connect to MySQL server on 'mysqlserver.us-east-1.rds.amazonaws.com' (timed out)")

We also see the node being created on the Tailscale UI but i keep getting a timeout when I run pytest.

name: Python application

on:
  push:
    branches: [ "feature/github-actions" ]
  pull_request:
    branches: [ "feature/github-actions" ]

env:
  AWS_CONFIG_FILE: .github/workflows/aws_config
  DB_NAME: "mydbname"
  DB_READ_SERVER: "mysqlserver.us-east-1.rds.amazonaws.com"
  DB_USERNAME: "root"
  DB_PASSWORD: ${{secrets.DB_PASSWORD}}

  AWS_PROFILE: "dev"
  API_VERSION: "v1"
  FRONT_END_KEY: ${{secrets.FRONT_END_KEY}}

  LOG_LEVEL: "INFO"
  DB_USER_ID: 32
  SENTRY_SAMPLE_RATE: 1
  NUMEXPR_MAX_THREADS: "8"

  LOG_LEVEL_CONSOLE: True
  LOG_LEVEL_ALGORITHM: "INFO"
  LOG_LEVEL_DB: "WARNING"

permissions:
  contents: read

jobs:
  build:
    runs-on: ubuntu-latest

    steps:
      - name: Tailscale
        uses: tailscale/github-action@v2
        with:
          oauth-client-id: ${{ secrets.TS_OAUTH_CLIENT_ID }}
          oauth-secret: ${{ secrets.TS_OAUTH_SECRET }}
          tags: tag:cicd
      - uses: actions/checkout@v4
      - name: Set up Python 3.12
        uses: actions/setup-python@v3
        with:
          python-version: "3.12"
      - name: Install dependencies
        run: |
          pip install -r requirements-dev.txt
      - name: Test with pytest
        env: 
          PYTHONPATH: ${{github.workspace}}/src
        run: |
          pytest

Jun 04 '24 17:06 khernandezrt

Switching the URL to a direct IP did the trick. Looks like a DNS issue. I will leave this issue open as id prefer not to use a direct IP.

Jun 04 '24 17:06 khernandezrt

I'm encountering a similar timeout error, although doesn't seem to be DNS in my case as the IP is resolved properly:

Error: Error connecting to PostgreSQL server database.us-east-1.rds.amazonaws.com (scheme: awspostgres): dial tcp correct.ip.address:5432: connect: connection timed out

Jun 10 '24 14:06 henworth

@henworth Have you setup your security policies correctly for your Tailscale instance?

Jun 10 '24 14:06 khernandezrt

@henworth Have you setup your security policies correctly for your Tailscale instance?

Yep, I've done all this. It was working fine and now I'm not sure what's wrong.

Connectivity to this db works fine from other non-GitHub nodes using hostname or ip.

Jun 10 '24 14:06 henworth

I also started having issues 2 weeks ago. I have also verified that things works fine outside of github actions using same configuration

Jun 12 '24 17:06 talha5389-teraception

I am having the same issue. It has been working perfectly so far but today I get random i/o timeouts.

Jun 27 '24 15:06 ebarriosjr

Same here! I had random failures especially on the first connection to our RDS instance (running in AWS) from a github action worker (running in Azure). Subsequent connections after the first failure would succeed. I did some debugging and found that the connection is going through DERP despite having inbound wireguard port for IPv4/v6 on the AWS side.

I changed our use to first run a single ping to the subnet router DNS hostname after bringing up tailscale and that seemed to dramatically improve reliability though still had 1 fail in 10 (that time it was the ping itself failing)

Set up Split DNS and haven't had a failure since then, though only have had 10 or so runs since then.

Jul 03 '24 23:07 ericpollmann

My issue turned out to be related to the stateful filtering added in v1.66.0. Once I disabled that on my subnet routers the problem disappeared.

Jul 04 '24 15:07 henworth

I wonder if there's a propagation delay here? E.g. a new node comes up but doesn't propagate fast enough. I wonder if adding a wait of 5 seconds or so would help here. Maybe thats why pinging may have helped?

The stateful filtering is interesting, but it's disabled by default it seems.

Nov 25 '24 17:11 aaomidi

@henworth can you describe what flags you changed? I think I'm seeing something similar to this but in the helm world this time.

Update:

--stateful-filtering Enable stateful filtering for [subnet routers](https://tailscale.com/kb/1019/subnets) and [exit nodes](https://tailscale.com/kb/1103/exit-nodes). When enabled, inbound packets with another node's destination IP are dropped, unless they are a part of a tracked outbound connection from that node. Defaults to disabled.

Seems like default is false?

Dec 05 '24 22:12 aaomidi

At the time I wrote that comment the default was true, it has since been changed to false in a subsequent release.

Dec 05 '24 22:12 henworth

v4.0.0 of the action now includes a ping parameter that you can use to specify which devices need to be reachable before your CI job proceeds. We are hopeful that this will resolve your issue. If it does not, please let us know by reopening the ticket.

Oct 14 '25 19:10 oxtoacart