dex icon indicating copy to clipboard operation
dex copied to clipboard

Token refresh hangs when using local passwords on SQLite

Open tsardelli-dh opened this issue 9 months ago • 3 comments

Preflight Checklist

  • [x] I agree to follow the Code of Conduct that this project adheres to.
  • [x] I have searched the issue tracker for an issue that matches the one I want to file, without success.
  • [x] I am not looking for support or already pursued the available support channels without success.

Version

master

Storage Type

SQLite

Installation Type

Other (specify below)

Expected Behavior

I should be able to refresh the token when using username/password authentication and SQLite for storage

Actual Behavior

I stumbled into this weird error when using dex in my application, but I was finally able to reproduce it with the example-app included in this repo. What happens is that, when I enable local passwords (enablePasswordDB: true) with a SQLite backend, dex hangs when I try to refresh the token and I have to SIGKILL it.

Steps To Reproduce

Build and run dex with the example configuration provided below

make build && ./bin/dex serve config.yaml
# config.yaml

issuer: http://127.0.0.1:5556/dex
storage:
  type: sqlite3
  config:
    file: examples/dex.db
web:
  http: 0.0.0.0:5556
telemetry:
  http: 0.0.0.0:5558
logger:
  level: "debug"
staticClients:
  - id: example-app
    redirectURIs:
      - "http://127.0.0.1:5555/callback"
    name: "Example App"
    secret: ZXhhbXBsZS1hcHAtc2VjcmV0
connectors:
  - type: mockCallback
    id: mock
    name: Example
enablePasswordDB: true
staticPasswords:
  - email: "[email protected]"
    hash: "$2a$10$2b2cU8CPhOTaGrs1HRQuAueS7JTT5ZHsHSzYiFPm1leZck7Mc8T4W"
    username: "admin"
    userID: "08a8684b-db88-4b73-90a9-3cd1661f5466"

Build and run the example-app:

make examples && ./bin/example-app

Open the sqlite database defined in the configuration file with sqlite3 examples/dex.db and insert a user (the second value is the encoded password which is password)

INSERT INTO password (email, hash, username, user_id)
VALUES (
    '[email protected]',
    X'243261243130243262326355384350684f5461477273314852517541756553374a5454355a487348537a596946506d316c655a636b374d6338543457',
    'test',
    '24ec23e3-11ca-41e1-b8a7-c53d7242176e'
);

In your browser:

  • Visit http://127.0.0.1:5555
  • Click on "Log in" -> Log in with Email
  • Insert email [email protected] and password password
  • Click on "Login" and then "Grant Access"
  • Click on "Redeem refresh token"

At this point dex will become unresponsive and the callback request will be pending forever. The only way to resume dex is to SIGKILL it.

Additional Information

I didn't try with other SQL backends, but the error doesn't happen with the static password (e.g. the admin user defined in the config file) or the Microsoft connector.

Setting db.SetMaxOpenConns(2) in storage/sql/sqlite.go "fixes" the issue but it's not the right solution. It makes me think though that another query that runs before the refresh is keeping the connection open, but I couldn't find out which one yet.

This is not a recent issue. I have been affected by it since v2.35 if I remember correctly. At the time I managed to avoid it by reverting commit https://github.com/dexidp/dex/commit/4b5f1d52, but even looking at the diff in the commit, I couldn't find out the culprit.

Below I provide the logs that I collect when reproducing the issue as described above (I added some debug log myself that you will see in there).

Configuration

issuer: http://127.0.0.1:5556/dex
storage:
  type: sqlite3
  config:
    file: examples/dex.db
web:
  http: 0.0.0.0:5556
telemetry:
  http: 0.0.0.0:5558
logger:
  level: "debug"
staticClients:
  - id: example-app
    redirectURIs:
      - "http://127.0.0.1:5555/callback"
    name: "Example App"
    secret: ZXhhbXBsZS1hcHAtc2VjcmV0
connectors:
  - type: mockCallback
    id: mock
    name: Example
enablePasswordDB: true
staticPasswords:
  - email: "[email protected]"
    hash: "$2a$10$2b2cU8CPhOTaGrs1HRQuAueS7JTT5ZHsHSzYiFPm1leZck7Mc8T4W"
    username: "admin"
    userID: "08a8684b-db88-4b73-90a9-3cd1661f5466"

Logs

time=2025-07-04T18:26:56.387+02:00 level=INFO msg="Version info" dex_version=c1337c4b1dfc56f905564147613cf2d723a8ce1a-dirty go.version=go1.24.4 go.os=linux go.arch=amd64
time=2025-07-04T18:26:56.387+02:00 level=INFO msg="config using log level" level=DEBUG
time=2025-07-04T18:26:56.387+02:00 level=INFO msg="config issuer" issuer=http://127.0.0.1:5556/dex
time=2025-07-04T18:26:56.387+02:00 level=DEBUG msg="Set SQLite max connections to " !BADKEY=1
time=2025-07-04T18:26:56.388+02:00 level=DEBUG msg="Running trans.conn.QueryRow"
time=2025-07-04T18:26:56.388+02:00 level=INFO msg="config storage" storage_type=sqlite3
time=2025-07-04T18:26:56.388+02:00 level=INFO msg="config static client" client_name="Example App"
time=2025-07-04T18:26:56.388+02:00 level=INFO msg="config connector" connector_id=mock
time=2025-07-04T18:26:56.388+02:00 level=INFO msg="config connector: local passwords enabled"
time=2025-07-04T18:26:56.388+02:00 level=INFO msg="config refresh tokens rotation" enabled=true
time=2025-07-04T18:26:56.389+02:00 level=DEBUG msg="Executing SQL query" "query="="\n\t\tselect\n\t\t\tverification_keys, signing_key, signing_key_pub, next_rotation\n\t\tfrom keys\n\t\twhere id=?\n\t" "args="=[keys] "translatedArgs="=[keys]
time=2025-07-04T18:26:56.389+02:00 level=DEBUG msg="Row is" "row="="&{err:<nil> rows:0xc0001320a0}"
time=2025-07-04T18:26:56.389+02:00 level=INFO msg="listening on" server=telemetry address=0.0.0.0:5558
time=2025-07-04T18:26:56.389+02:00 level=INFO msg="listening on" server=http address=0.0.0.0:5556
time=2025-07-04T18:29:57.764+02:00 level=DEBUG msg="Executing SQL query" "query="="\n\t\tselect\n\t\t\tid, client_id, response_types, scopes, redirect_uri, nonce, state,\n\t\t\tforce_approval_prompt, logged_in,\n\t\t\tclaims_user_id, claims_username, claims_preferred_username,\n\t\t\tclaims_email, claims_email_verified, claims_groups,\n\t\t\tconnector_id, connector_data, expiry,\n\t\t\tcode_challenge, code_challenge_method, hmac_key\n\t\tfrom auth_request where id = ?;\n\t" "args="=[tkr2cftwtudogxuvkgq2bsmph] "translatedArgs="=[tkr2cftwtudogxuvkgq2bsmph]
time=2025-07-04T18:29:57.765+02:00 level=DEBUG msg="Row is" "row="="&{err:<nil> rows:0xc0005b9a40}"
time=2025-07-04T18:30:32.220+02:00 level=DEBUG msg="Executing SQL query" "query="="\n\t\tselect\n\t\t\tid, client_id, response_types, scopes, redirect_uri, nonce, state,\n\t\t\tforce_approval_prompt, logged_in,\n\t\t\tclaims_user_id, claims_username, claims_preferred_username,\n\t\t\tclaims_email, claims_email_verified, claims_groups,\n\t\t\tconnector_id, connector_data, expiry,\n\t\t\tcode_challenge, code_challenge_method, hmac_key\n\t\tfrom auth_request where id = ?;\n\t" "args="=[tkr2cftwtudogxuvkgq2bsmph] "translatedArgs="=[tkr2cftwtudogxuvkgq2bsmph]
time=2025-07-04T18:30:32.220+02:00 level=DEBUG msg="Row is" "row="="&{err:<nil> rows:0xc0005b9a40}"
time=2025-07-04T18:30:32.221+02:00 level=DEBUG msg="trying to do password login from handlePasswordLogin" [email protected] password=password scopes="{OfflineAccess:true Groups:false}"
2025/07/04 18:30:32 Inside GetPassword() in storage/sql/crud.go
2025/07/04 18:30:32 Inside getPassword() in storage/sql/crud.go. email= [email protected]
time=2025-07-04T18:30:32.221+02:00 level=DEBUG msg="Executing SQL query" "query="="\n\t\tselect\n\t\t\temail, hash, username, user_id\n\t\tfrom password where email = ?;\n\t" "args="=[[email protected]] "translatedArgs="=[[email protected]]
time=2025-07-04T18:30:32.221+02:00 level=DEBUG msg="Row is" "row="="&{err:<nil> rows:0xc0005b9d60}"
2025/07/04 18:30:32 Running scanPassword in storage/sql/crud.go. scanner= &{<nil> 0xc0005b9d60}
2025/07/04 18:30:32 Run Scan in scanPassword. password= {[email protected] [36 50 97 36 49 48 36 50 98 50 99 85 56 67 80 104 79 84 97 71 114 115 49 72 82 81 117 65 117 101 83 55 74 84 84 53 90 72 115 72 83 122 89 105 70 80 109 49 108 101 90 99 107 55 77 99 56 84 52 87]  test 24ec23e3-11ca-41e1-b8a7-c53d7242176e}
time=2025-07-04T18:30:32.271+02:00 level=DEBUG msg="Running trans.conn.QueryRow"
time=2025-07-04T18:30:32.272+02:00 level=INFO msg="login successful" connector_id=local username=test preferred_username="" [email protected] groups=[] request_id=4ce3c2c8-043d-4685-92e0-0d93d0f3a705
time=2025-07-04T18:30:32.272+02:00 level=DEBUG msg="Executing SQL query" "query="="\n\t\tselect\n\t\t\tuser_id, conn_id, refresh, connector_data\n\t\tfrom offline_session\n\t\twhere user_id = ? AND conn_id = ?;\n\t\t" "args="="[24ec23e3-11ca-41e1-b8a7-c53d7242176e local]" "translatedArgs="="[24ec23e3-11ca-41e1-b8a7-c53d7242176e local]"
time=2025-07-04T18:30:32.272+02:00 level=DEBUG msg="Row is" "row="="&{err:<nil> rows:0xc0005b9ea0}"
time=2025-07-04T18:30:32.272+02:00 level=DEBUG msg="Running trans.conn.QueryRow"
time=2025-07-04T18:30:32.274+02:00 level=DEBUG msg="Executing SQL query" "query="="\n\t\tselect\n\t\t\tid, client_id, response_types, scopes, redirect_uri, nonce, state,\n\t\t\tforce_approval_prompt, logged_in,\n\t\t\tclaims_user_id, claims_username, claims_preferred_username,\n\t\t\tclaims_email, claims_email_verified, claims_groups,\n\t\t\tconnector_id, connector_data, expiry,\n\t\t\tcode_challenge, code_challenge_method, hmac_key\n\t\tfrom auth_request where id = ?;\n\t" "args="=[tkr2cftwtudogxuvkgq2bsmph] "translatedArgs="=[tkr2cftwtudogxuvkgq2bsmph]
time=2025-07-04T18:30:32.274+02:00 level=DEBUG msg="Row is" "row="="&{err:<nil> rows:0xc0004e8960}"
time=2025-07-04T18:30:47.885+02:00 level=DEBUG msg="Executing SQL query" "query="="\n\t\tselect\n\t\t\tid, client_id, response_types, scopes, redirect_uri, nonce, state,\n\t\t\tforce_approval_prompt, logged_in,\n\t\t\tclaims_user_id, claims_username, claims_preferred_username,\n\t\t\tclaims_email, claims_email_verified, claims_groups,\n\t\t\tconnector_id, connector_data, expiry,\n\t\t\tcode_challenge, code_challenge_method, hmac_key\n\t\tfrom auth_request where id = ?;\n\t" "args="=[tkr2cftwtudogxuvkgq2bsmph] "translatedArgs="=[tkr2cftwtudogxuvkgq2bsmph]
time=2025-07-04T18:30:47.885+02:00 level=DEBUG msg="Row is" "row="="&{err:<nil> rows:0xc0004e8c80}"
time=2025-07-04T18:30:47.891+02:00 level=DEBUG msg="Executing SQL query" "query="="\n\t\tselect\n\t\t\tid, client_id, scopes, nonce, redirect_uri,\n\t\t\tclaims_user_id, claims_username, claims_preferred_username,\n\t\t\tclaims_email, claims_email_verified, claims_groups,\n\t\t\tconnector_id, connector_data,\n\t\t\texpiry,\n\t\t\tcode_challenge, code_challenge_method\n\t\tfrom auth_code where id = ?;\n\t" "args="=[u4dam2r5zaqt5oqgsbg7r7tvb] "translatedArgs="=[u4dam2r5zaqt5oqgsbg7r7tvb]
time=2025-07-04T18:30:47.891+02:00 level=DEBUG msg="Row is" "row="="&{err:<nil> rows:0xc0001320a0}"
time=2025-07-04T18:30:47.894+02:00 level=DEBUG msg="Executing SQL query" "query="="\n\t\tselect\n\t\t\tuser_id, conn_id, refresh, connector_data\n\t\tfrom offline_session\n\t\twhere user_id = ? AND conn_id = ?;\n\t\t" "args="="[24ec23e3-11ca-41e1-b8a7-c53d7242176e local]" "translatedArgs="="[24ec23e3-11ca-41e1-b8a7-c53d7242176e local]"
time=2025-07-04T18:30:47.894+02:00 level=DEBUG msg="Row is" "row="="&{err:<nil> rows:0xc000132780}"
time=2025-07-04T18:30:47.894+02:00 level=DEBUG msg="Running trans.conn.QueryRow"
ktime=2025-07-04T18:35:53.262+02:00 level=DEBUG msg="Handling requesto to refresh token in handleRefreshToken" "clientID="=example-app
time=2025-07-04T18:35:53.262+02:00 level=DEBUG msg="Starting getRefreshTokenFromStorage()" "clientID="=example-app "token="="refresh_id:\"mb37il2slzquhzdftu4v56doq\"  token:\"xowl7zocinh53rrmzsc43kgwl\""
time=2025-07-04T18:35:53.262+02:00 level=DEBUG msg="Executing SQL query" "query="="\n\t\tselect\n\t\t\tid, client_id, scopes, nonce,\n\t\t\tclaims_user_id, claims_username, claims_preferred_username,\n\t\t\tclaims_email, claims_email_verified,\n\t\t\tclaims_groups,\n\t\t\tconnector_id, connector_data,\n\t\t\ttoken, obsolete_token, created_at, last_used\n\t\tfrom refresh_token where id = ?;\n\t" "args="=[mb37il2slzquhzdftu4v56doq] "translatedArgs="=[mb37il2slzquhzdftu4v56doq]
time=2025-07-04T18:35:53.263+02:00 level=DEBUG msg="Row is" "row="="&{err:<nil> rows:0xc0004e8320}"
time=2025-07-04T18:35:53.263+02:00 level=DEBUG msg="Got refresh token from storage" "refresh="="{ID:mb37il2slzquhzdftu4v56doq Token:xowl7zocinh53rrmzsc43kgwl ObsoleteToken: CreatedAt:2025-07-04 16:30:47.894085283 +0000 UTC LastUsed:2025-07-04 16:30:47.894089055 +0000 UTC ClientID:example-app ConnectorID:local ConnectorData:[] Claims:{UserID:24ec23e3-11ca-41e1-b8a7-c53d7242176e Username:test PreferredUsername: Email:[email protected] EmailVerified:true Groups:[]} Scopes:[openid profile email offline_access] Nonce:}"
time=2025-07-04T18:35:53.263+02:00 level=DEBUG msg="Succesfully assigned connector" "connector="="{ResourceVersion: Connector:{s:0xc00011e240}}" "connector_id="=local
time=2025-07-04T18:35:53.263+02:00 level=DEBUG msg="Executing SQL query" "query="="\n\t\tselect\n\t\t\tuser_id, conn_id, refresh, connector_data\n\t\tfrom offline_session\n\t\twhere user_id = ? AND conn_id = ?;\n\t\t" "args="="[24ec23e3-11ca-41e1-b8a7-c53d7242176e local]" "translatedArgs="="[24ec23e3-11ca-41e1-b8a7-c53d7242176e local]"
time=2025-07-04T18:35:53.263+02:00 level=DEBUG msg="Row is" "row="="&{err:<nil> rows:0xc0004e83c0}"
time=2025-07-04T18:35:53.263+02:00 level=DEBUG msg="Using offline session connector data" "ConnectorData="=""
time=2025-07-04T18:35:53.263+02:00 level=DEBUG msg="Hydrated refresh token from storage" "rCtx="="&{storageToken:0xc0001b3c20 requestToken:0xc00019b040 connector:{ResourceVersion: Connector:{s:0xc00011e240}} connectorData:[] scopes:[]}"
time=2025-07-04T18:35:53.263+02:00 level=DEBUG msg="Entered into updateRefreshToken" "ctx="="context.Background.WithValue(net/http context value http-server, *http.Server).WithValue(net/http context value local-addr, 127.0.0.1:5556).WithCancel.WithCancel.WithValue(mux.contextKey, map[string]string).WithValue(mux.contextKey, *mux.Route).WithValue(server.logRequestKey, e46ba037-fb4b-4c73-a8e9-088931b56568)" "rCtx="="&{storageToken:0xc0001b3c20 requestToken:0xc00019b040 connector:{ResourceVersion: Connector:{s:0xc00011e240}} connectorData:[] scopes:[openid profile email offline_access]}"
time=2025-07-04T18:35:53.263+02:00 level=DEBUG msg="Running trans.conn.QueryRow"
time=2025-07-04T18:35:53.263+02:00 level=DEBUG msg="Ready to call s.refreshWithConnector from updateRefreshToken" "ident="="{UserID:24ec23e3-11ca-41e1-b8a7-c53d7242176e Username:test PreferredUsername: Email:[email protected] EmailVerified:true Groups:[] ConnectorData:[]}"
time=2025-07-04T18:35:53.263+02:00 level=DEBUG msg="connector data" "ident="="{UserID:24ec23e3-11ca-41e1-b8a7-c53d7242176e Username:test PreferredUsername: Email:[email protected] EmailVerified:true Groups:[] ConnectorData:[]}" "rctx_connector_data="=""
2025/07/04 18:35:53 Ready to call db.s.GetPassword from Refresh() in server.go
2025/07/04 18:35:53 Inside GetPassword() in storage/sql/crud.go
2025/07/04 18:35:53 Inside getPassword() in storage/sql/crud.go. email= [email protected]
time=2025-07-04T18:35:53.263+02:00 level=DEBUG msg="Executing SQL query" "query="="\n\t\tselect\n\t\t\temail, hash, username, user_id\n\t\tfrom password where email = ?;\n\t" "args="=[[email protected]] "translatedArgs="=[[email protected]]
----> Here is where it hangs

tsardelli-dh avatar Jul 04 '25 16:07 tsardelli-dh

We have encountered with the same problem but with Postgresql backend. Our investigation led to almost the same result. It hangs while trying to refresh token, but we didn't find a way to reproduce steadily. Our Postgres is behind PgBouncer with 5 active connections. So our assumption is that GetPassword is trying to execute in another transaction (another connection) which in our case cannot exceed 5 active connections. We can increase the number of connections but we think it won't solve the problem, but we will see it less often

AOne-T avatar Jul 09 '25 13:07 AOne-T

Hello, just checking in to see if there's any update related to this issue and if there's anything I can do to help you debug it.

tsardelli-dh avatar Aug 29 '25 08:08 tsardelli-dh

Hello, after more days of digging and debugging, I think I have finally found what causes the issue and why it happens only for the local PassworDB "connector" with connection limits set on the database (like SQLite or PgBouncer).

The high level flow is the following:

  1. Client sends a request to dex to refresh a token which is handled by handleRefreshToken and passed to updateRefreshToken
  2. In updateRefreshToken, we define a refreshTokenUpdater callback that will perform the actual s.refreshWithConnector flow. We then pass the callback to storage.UpdateRefreshToken like this: s.storage.UpdateRefreshToken(ctx, rCtx.storageToken.ID, refreshTokenUpdater). This is where the problem happens when the token update is handled by dex (PasswordDB) and not by an external connector like microsoft, google, oidc, etc. Here's why:
  3. In the case of the SQL storage implementation, UpdateRefreshToken starts a new transaction and, while the transaction is still open invokes the updater(r) function (which is the refreshTokenUpdater callback we just passed to it).
  4. The updater function will invoke s.refreshWithConnector() that will then invoke the Refresh() method in the connector itself. For the PassworDB case this will try to read from the database (SQLite) through the GetPassword() method. But this will be blocked because it tries to open a new connection while the transaction from step 3 is still open. This bug was introduced by https://github.com/dexidp/dex/commit/4b5f1d52 because refreshWithConnector() was moved inside UpdateRefreshToken while in the old flow it was executed before.

I hope my explanation is clear (and correct 😅), otherwise please let me know.

I can open a PR that should fix this issue using a mutex, but @nabokihms @sagikazarmark please let me know if you have different opinions on that (sorry for the direct ping but I'm not sure if you'll notice the comment on an old issue otherwise)

tsardelli-dh avatar Sep 04 '25 10:09 tsardelli-dh