[Bug]: 401 on privileged actions after cold restart despite valid login

Open InigoGastesi opened this issue 7 months ago • 1 comments

🐞 Bug Summary

After a cold restart of the server/Kubernetes node (e.g., powered off overnight), the Admin Web UI intermittently returns 401 Unauthorized for privileged actions even though I appear logged in. Affected actions include adding MCP servers, viewing metrics, and creating servers.

🧩 Affected Component

Select the area of the project impacted:

[x] mcpgateway - API
[x] mcpgateway - UI (admin panel)
[ ] mcpgateway.wrapper - stdio wrapper
[ ] Federation or Transports
[ ] CLI, Makefiles, or shell scripts
[ ] Container setup (Docker/Podman/Compose)
[ ] Other (explain below)

🔁 Steps to Reproduce

Deploy ghcr.io/ibm/mcp-context-forge:latest on Kubernetes with UI and Admin API enabled and auth required (env excerpt below). DB is SQLite on a PVC at /data.
Power off the host (or shut down the cluster) at end of day; power back on next day. (A cold start of the pod may also reproduce.)
Log into the Admin UI (Basic Auth).
Try any privileged action: Add MCP server, Metrics tab, Create server, etc.
The UI shows “401 Unauthorized” responses for those API calls while the UI still indicates I’m logged in.

🤔 Expected Behavior

Admin actions should succeed when authenticated (200/201 responses), without requiring any extra steps after a cold restart.

📓 Logs / Error Output

Network panel shows 401 on endpoints such as /admin/servers, /admin/metrics, and related admin routes.
Pod logs primarily show 401 responses for those requests (no stacktrace).
⚠️ No secrets included. (Can provide additional sanitized logs if needed.)

🧠 Environment Info

You can retrieve most of this from the /version endpoint.

Key	Value
Version or commit	`ghcr.io/ibm/mcp-context-forge:latest` (as of 2025-08-27)
Runtime	Containerized in Kubernetes (auth required; UI + Admin API enabled)
Platform / OS	Kubernetes cluster (Namespace `mcp`)
Container	Deployed via Deployment + PVC; Service is ClusterIP (HTTP to port 4444)

🧩 Additional Context (optional)

Kubernetes manifest (relevant bits):

env:
  - { name: HOST, value: "0.0.0.0" }
  - { name: MCPGATEWAY_UI_ENABLED, value: "true" }
  - { name: MCPGATEWAY_ADMIN_API_ENABLED, value: "true" }
  - { name: AUTH_REQUIRED, value: "true" }
  - name: BASIC_AUTH_USER
    valueFrom: { secretKeyRef: { name: mcpgateway-secret, key: BASIC_AUTH_USER } }
  - name: BASIC_AUTH_PASSWORD
    valueFrom: { secretKeyRef: { name: mcpgateway-secret, key: BASIC_AUTH_PASSWORD } }
  - name: JWT_SECRET_KEY
    valueFrom: { secretKeyRef: { name: mcpgateway-secret, key: JWT_SECRET_KEY } }
  - name: DATABASE_URL
    value: "sqlite:////data/gateway/mcp.db"

Notes / hypotheses to help triage:

If cookies are marked Secure and the UI is accessed over plain HTTP, the browser won’t send the cookie, which could present as 401s on admin routes after restart/session changes. Consider reproducing with HTTPS or, only for testing, SECURE_COOKIES=false.
Confirm whether admin auth relies on a cookie vs. header in the UI; check COOKIE_SAMESITE and related settings.
Verify that the JWT signing key (JWT_SECRET_KEY) and server time are stable across restarts (clock skew can invalidate tokens).

Potential directions:

Provide guidance on expected cookie settings for HTTP vs HTTPS deployments.
Clarify whether the UI refreshes/rotates tokens after pod restarts, and if any cache needs to be cleared.
Any known issues with SQLite + PVC on restart that could affect session storage would be helpful to rule in/out.

Aug 27 '25 07:08 InigoGastesi

Hi @InigoGastesi - thanks for the detailed bug report! I was able to reproduce the issue:

You're hitting a cookie security configuration issue. Your deployment has SECURE_COOKIES: "true" but you're accessing over plain HTTP (ClusterIP without TLS).

When cookies are marked Secure, browsers won't send them over HTTP - it's a security feature. So what happens is:

Login works, cookie gets set
Browser stores the cookie but refuses to send it on subsequent HTTP requests
Server sees no auth token → 401
UI still thinks you're logged in (cookie exists locally) but every API call fails

This explains why it's intermittent after restart - the cookie is there but never gets transmitted.

Quick Fix

Add this to your values or ConfigMap:

mcpContextForge:
  config:
    SECURE_COOKIES: "false"

Then restart your pods. Should work immediately.

Proper Fix (for production)

Enable TLS on your ingress:

mcpContextForge:
  ingress:
    enabled: true
    tls:
      enabled: true

Keep SECURE_COOKIES: "true" when using HTTPS.

Why the confusion

The current Helm chart defaults to SECURE_COOKIES: "true" even though most dev setups use plain HTTP. The warning in the logs only shows up on initial login failure, not on the 401s afterward, making it hard to diagnose.

We should probably:

Default to false in the Helm chart for easier setup
Add better warnings when secure cookies are used over HTTP
Auto-detect the protocol and adjust cookie flags accordingly

Let me know if setting SECURE_COOKIES: "false" resolves it!

Oct 12 '25 00:10 crivetimihai