improvement(logs): state machine of workflow execution
Summary
State machine of workflow execution made explicit. No more derived states during execution or persisting logs. Last block executing when cancelled continues to finish executing retaining state as running until it's done before transitioning to cancelled.
Type of Change
- [x] Other: Code Improvement
Testing
Tested manually.
Checklist
- [x] Code follows project style guidelines
- [x] Self-reviewed my changes
- [x] Tests added/updated and passing
- [x] No new warnings introduced
- [x] I confirm that I have read and agree to the terms outlined in the Contributor License Agreement (CLA)
The latest updates on your projects. Learn more about Vercel for GitHub.
Greptile Summary
Refactored workflow execution state management from derived states to an explicit state machine with a dedicated status column tracking execution lifecycle (running → completed/failed/cancelled/pending). The change eliminates ambiguity in determining execution state and properly handles cancellation scenarios.
Key Changes:
- Added
statuscolumn toworkflow_execution_logstable to explicitly track execution state instead of deriving it fromlevelandendedAt - Implemented
completeWithCancellation()inLoggingSessionto handle cancelled executions distinctly from failures - Added
completedflag toLoggingSessionto prevent duplicate completion calls (idempotency) - Modified
ExecutionEngineto checkisCancelledflag and returnstatus='cancelled'when cancellation is detected - Enhanced cancellation flow to allow currently executing blocks to finish before transitioning to cancelled state
- Fixed race condition in chat execution cancellation by tracking active execution ID via
currentChatExecutionIdRef - Replaced
AbortControllerwith stream reader cancellation for better control of streaming cleanup - Updated human-in-the-loop manager to maintain status as
pendingwhile paused andrunningduring resume - Improved error handling in background execution jobs with proper logging session completion
Migration:
- Backfills existing logs:
level='error'→status='failed',endedAt IS NOT NULL→status='completed', otherwisestatus='running'
The refactoring improves clarity and correctness of execution state tracking throughout the system.
Confidence Score: 4/5
- This PR is safe to merge with minor considerations for testing edge cases
- The state machine refactoring is well-designed with proper migration and idempotency guards. The explicit status tracking eliminates ambiguity and the cancellation flow correctly allows in-flight blocks to complete. One concern is the
completedflag inLoggingSessionwhich could potentially cause issues if the session object is reused, though this appears unlikely given the usage patterns. The race condition fix in chat execution is solid. All changes maintain backward compatibility through the migration. - Pay close attention to
apps/sim/lib/logs/execution/logging-session.tsto verify thecompletedflag behavior in all execution paths
Important Files Changed
| Filename | Overview |
|---|---|
| packages/db/migrations/0132_dazzling_leech.sql | Added status column to track explicit execution state (running/completed/failed/cancelled), backfilled with correct values based on existing data |
| apps/sim/lib/logs/execution/logging-session.ts | Added explicit state machine with completeWithCancellation() method and completed flag to prevent duplicate completions |
| apps/sim/executor/execution/engine.ts | Added cancellation checks in execution loop, returns 'cancelled' status when isCancelled flag is set |
| apps/sim/lib/workflows/executor/execution-core.ts | Routes execution to appropriate logging completion method based on status (cancelled/paused/completed) |
| apps/sim/app/workspace/[workspaceId]/w/[workflowId]/hooks/use-workflow-execution.ts | Fixed race condition in chat execution cancellation by tracking active execution ID and preventing stale cleanup operations |
| apps/sim/lib/workflows/executor/human-in-the-loop-manager.ts | Updates execution log status to pending/running/failed as pause points are registered, resumed, or fail |
Sequence Diagram
sequenceDiagram
participant User
participant UI as UI Component
participant Hook as useWorkflowExecution
participant Core as execution-core.ts
participant Engine as ExecutionEngine
participant Session as LoggingSession
participant DB as Database
User->>UI: Clicks Run or Cancel
alt Start Execution
UI->>Hook: handleRunWorkflow()
Hook->>Session: safeStart()
Session->>DB: INSERT with status='running'
Hook->>Core: executeWorkflowCore()
Core->>Engine: execute()
loop While hasWork()
Engine->>Engine: Check isCancelled flag
alt Not Cancelled
Engine->>Engine: processQueue()
Engine->>Engine: executeNodeAsync()
else Cancelled and executing.size === 0
Engine-->>Engine: Break loop
end
end
Engine-->>Core: ExecutionResult with status
alt Status: cancelled
Core->>Session: safeCompleteWithCancellation()
Session->>DB: UPDATE status='cancelled'
else Status: paused
Core->>Session: (Skip completion, keep running)
Session->>DB: UPDATE status='pending'
else Status: completed/failed
Core->>Session: safeComplete/safeCompleteWithError()
Session->>DB: UPDATE status='completed/failed'
end
Core-->>Hook: result
Hook-->>UI: Update execution state
else Cancel Execution
UI->>Hook: handleCancelExecution()
Hook->>Hook: Set context.isCancelled = true
Hook->>Hook: Cancel stream reader
Hook->>Hook: Reset execution state
Note over Engine: Currently executing block<br/>continues to finish
Engine->>Engine: Check isCancelled on next iteration
Engine-->>Core: status='cancelled'
Core->>Session: completeWithCancellation()
Session->>DB: UPDATE status='cancelled'
end
alt Resume from Pause
User->>UI: Provides input for paused execution
UI->>Hook: Resume execution
Hook->>DB: UPDATE status='running'
Hook->>Core: executeWorkflowCore (resume mode)
Core->>Engine: execute()
Note over Engine: Continues from snapshot
Engine-->>Core: ExecutionResult
Core->>Session: Update parent execution log
Session->>DB: UPDATE status based on result
end
@greptile