crewAI [FEATURE] Support client-initiated real-time human-input event stream (WebSocket/SSE/long-polling) for pending human input

Feature Area

Other (please specify in additional context)

Is your feature request related to a an existing bug? Please link it here.

Not a bug, builds on prior feature discussions that were closed as “not planned” (#654, #2051) but reframes the problem around offering alternative integration methods for human input delivery.

Describe the solution you'd like

Background / Problem

CrewAI currently signals that it needs human input (i.e., enters “Pending Human Input”) only via externally delivered webhooks. That creates integration friction in scenarios where hosting a publicly reachable webhook endpoint is hard or undesirable (local dev behind NAT, locked-down security environments, real-time stacks already using persistent connections, etc.). The goal here is not to “fallback” to something else, it’s to offer other integration methods—client-initiated, real-time delivery of the human-input pause event so consumers can subscribe without needing to expose endpoints externally. :contentReference[oaicite:1]{index=1}

Proposal

Introduce an optional, client-initiated event stream (e.g., WebSocket, Server-Sent Events, or long-polling) for receiving “pending_human_input” notifications for a given crew/execution. This would sit alongside (not necessarily replace) the webhook mechanism and give integrators flexibility in how they receive clarification requests from the agent.

Key capabilities

Authenticated subscription per crew/execution using existing API credentials.
Real-time delivery of human-input pause events over a persistent channel (WebSocket/SSE) or efficient poll-style fallback (long-polling) when a persistent connection isn’t feasible.
Structured event payload including:
- event: e.g., "pending_human_input"
- execution_id
- crew_id
- task_id
- prompt / clarification question
- context / relevant metadata
- reason_flags (why input was requested)
- event_id (deduplication)
- timestamp
Reconnect & resume semantics: clients can recover missed events using last-seen event_id.
Ordering/dedupe support so integrations can safely handle retries or duplicate deliveries.
Lightweight handshake example (WebSocket):


// Client opens:
wss\://api.crewai.com/v1/crews/{crew\_id}/human-input-stream
Authorization: Bearer <token>

// Server sends:
{
"event": "pending\_human\_input",
"execution\_id": "...",
"task\_id": "...",
"prompt": "Need more details about current traffic volume",
"context": { ... },
"reason\_flags": \["ambiguity", "missing\_field"],
"event\_id": "uuid",
"timestamp": "2025-08-02T16:00:00Z"
}

Coexistence: If an integrator prefers or falls back to webhooks, that flow continues unchanged.

Describe alternatives you've considered

Webhook-only delivery (current default): requires public endpoint exposure and often tunneling (ngrok) in constrained environments.
Polling the execution status to detect when human input is needed: adds latency, inefficiency, and complexity around rate-limiting.
Proxying webhook callbacks internally and then pushing over internal WebSocket/SSE: works but still depends on the publicly routable webhook and adds extra correlation layers.
Hybrid long-polling/short-polling for “needs input” signals: viable in some constrained cases, but has scalability/latency trade-offs compared to persistent streams.