agents system
WIP
⚠️ No Changeset found
Latest commit: c3a0a43c9c0d15bf830dff663339f52bf35c1c1d
Merging this PR will not cause a version bump for any packages. If these changes should not result in a new version, you're good to go. If these changes should result in a version bump, you need to add a changeset.
This PR includes no changesets
When changesets are added to this PR, you'll see the packages that this PR includes changesets for and the associated semver types
Click here to learn what changesets are, and how to add one.
Click here if you're a maintainer who wants to add a changeset to this PR
Claude Code Review
Status: WIP - Large agent system implementation
This PR introduces a comprehensive agent orchestration system built on Cloudflare Workers and Durable Objects. Given the WIP status, here are substantive issues to address:
Critical Issues
1. Race condition in subagent cancellation (packages/agents/src/sys/agent/index.ts:263-291)
The cancel operation iterates over waiting subagents without proper atomicity. Between fetching waitingSubagents and completing cancellation, new subagents could spawn or complete, leading to inconsistent state.
-
Fix: Wrap the entire cancel logic in
blockConcurrencyWhile(likechildResultdoes at line 543)
2. Memory leak in Store caching (packages/agents/src/sys/agent/store.ts:167-205)
appendMessages invalidates cache but doesn't clear it for other mutations like editFile. This creates inconsistency where some operations maintain stale caches while others don't.
- Impact: File edits may not be visible to subsequent reads
- Fix: Consistently invalidate or use a unified cache strategy
3. Missing error handling in async middleware hooks (packages/agents/src/sys/agent/index.ts:466-477)
step() catches errors but individual middleware hooks in executePendingTools (lines 373, 415-419) don't have try-catch. A failing onToolStart could crash the entire agent.
- Fix: Wrap middleware hook calls in try-catch
4. SQL injection via string concatenation (packages/agents/src/sys/agent/store.ts:316)
readFile uses template literal in SQL query. While the path comes from internal sources now, this is a security antipattern.
-
Fix: Use parameterized queries consistently:
exec('SELECT content FROM files WHERE path = ?', [path])
Architecture Issues
5. Unbounded recursion risk in agent loops
No explicit limit on run steps. A misbehaving agent could exhaust resources with infinite tool calls.
-
Recommendation: Add
maxStepsconfig per blueprint with circuit breaker
6. No retry logic for inter-DO communication
Subagent spawning (/child_result calls) can fail due to network issues. Failures leave parent in paused state indefinitely.
- Recommendation: Add retry with exponential backoff or timeout-based recovery
Testing Gaps
7. Missing tests for concurrent operations
Tests don't verify concurrent invoke calls, race conditions between cancel and childResult, or multiple subagents completing simultaneously.
8. No test coverage for Store cache invalidation
The cache invalidation bugs aren't caught because tests don't verify read-after-write consistency across different Store methods.
Minor Issues
-
Type safety:
Infotype uses optionalthreadIdbut code assumes it's always set after registration (line 117, 303) - Documentation: Complex subagent flow in architecture.md doesn't mention cancellation propagation behavior
-
Performance:
listMessages()andlistEvents()rebuild full arrays on every call even when cached (lines 208, 454)
Positive Notes
- Strong separation of concerns (Store, SystemAgent, middleware)
- Comprehensive event emission for observability
- Good WebSocket integration for real-time updates
- Extensive test coverage for happy paths
Next Steps: Focus on concurrency correctness and error handling before expanding features.