claude-code icon indicating copy to clipboard operation
claude-code copied to clipboard

[BUG] EPERM on long commands still crashing Claude Code

Open joetomasone opened this issue 3 months ago • 5 comments

Preflight Checklist

  • [x] I have searched existing issues and this hasn't been reported yet
  • [x] This is a single bug report (please file separate reports for different bugs)
  • [x] I am using the latest version of Claude Code

What's Wrong?

My understanding is that long commands are supposed to be running in the background now, but either this functionality is not working or it failed to work in this and other cases.

Claude Code Crash Report - 2025-10-17

Summary

Claude Code crashed with Error: kill EPERM when executing a chained systemctl command, despite having crash prevention rules in context.

Environment

  • Claude Code Version: 2.0.21
  • System: Ubuntu 24.04.2 LTS
  • Date/Time: 2025-10-17 ~14:45 UTC
  • Session Context Usage: ~4% remaining before crash

Crash Details

Exact Command Executed

sudo systemctl restart storms-monitor.service && sleep 5 && sudo systemctl status storms-monitor.service | head -15

Error Message

Error: kill EPERM
    at process.kill (node:internal/process/per_thread:232:13)
    at ZwA (file:///usr/local/lib/node_modules/@anthropic-ai/claude-code/cli.js:195:2927)
    at file:///usr/local/lib/node_modules/@anthropic-ai/claude-code/cli.js:195:2800
    at Array.forEach (<anonymous>)

Tool Call Details

  • Tool: Bash
  • Description: "Restart storms-monitor service and check status"
  • Timeout: Default 120000ms (2 minutes)
  • Actual Duration: Exceeded timeout during command chain execution

Context

Rules Known to Claude

The following rules were read at session startup from /opt/CLAUDE.md:

## 🚨 STOP! CRASH PREVENTION WARNING 🚨

**THESE COMMANDS WILL 100% CRASH CLAUDE CLI:**
```bash
# ❌ ANY command with -f (follow) flag
journalctl -f              # CRASH!
tail -f                    # CRASH!

# ❌ ANY piped command that runs long
python script.py | head    # CRASH!

# ❌ ANY Python script > 2 minutes
sudo -u flask venv/bin/python long_script.py  # CRASH!

# ❌ Flask development servers (wsgi.py)
sudo -u flask venv/bin/python wsgi.py  # CRASH!

USE SAFE ALTERNATIVES:

  • journalctl --since "1 minute ago" (no -f flag)
  • Write to file first: python script.py > out.txt && cat out.txt
  • For long scripts: "This will timeout. Please run: [command]"

### What Happened
1. Claude made a code fix to a python file
2. Wanted to verify the fix worked immediately
4. Executed chained systemctl command to restart + verify in one step
5. Command exceeded 2-minute timeout threshold
6. Claude Code attempted to kill the process → EPERM error
7. Session crashed, all context lost
### Expected Behavior
**Option 1**: Command should have been rejected with a warning about dangerous pattern
**Option 2**: Timeout should gracefully terminate without crashing the entire session
**Option 3**: EPERM should be caught and handled without session termination

### Actual Behavior
Session crashed with EPERM, all context and conversation history lost.

## Root Cause Analysis

### Immediate Cause
Chaining `systemctl restart` (which can take time) with `sleep 5` and `status | head -15` exceeded the 2-minute Bash timeout, triggering an EPERM error during process termination.

### Deeper Issue
Despite having explicit crash prevention rules in context:
1. Claude had full knowledge of the dangerous patterns
2. Rules were read 5-10 minutes prior to the crash
3. The specific pattern used was covered in the rules
4. Cognitive bias toward "verify immediately" overpowered rule-checking

This suggests **documentation-based prevention is insufficient** - dangerous patterns need **architectural enforcement** at the tool execution layer.

## Pattern Recognition Issues

### This Pattern Recurs
From `/opt/CLAUDE.md`:
```markdown
### 7. **⚠️ LITERAL VALUE CORRUPTION** - CRITICAL DATA INTEGRITY BUG (GitHub #3004)

The crash prevention rules are documented similarly to other critical rules, but compliance is inconsistent across sessions. This is not an isolated incident.

Medium Priority (Feature Requests)

  1. Command Validation Layer: Optional pre-execution validation against known dangerous patterns
  2. Safety Mode Toggle: Allow users to enable strict mode that rejects dangerous patterns
  3. Timeout Warnings: Warn when commands approach timeout threshold instead of hard failure

Low Priority (Documentation)

  1. Document that rules in project files may not reliably affect LLM behavior
  2. Recommend architectural enforcement over documentation-based prevention
  3. Add crash recovery guidance for users

Reproduction Steps

  1. Have crash prevention rules in CLAUDE.md
  2. Read CLAUDE.md at session startup
  3. Execute any chained systemctl command that might take >2 minutes:
    sudo systemctl restart service.service && sleep 5 && sudo systemctl status service.service
    
  4. Wait for timeout (120 seconds)
  5. Observe EPERM crash

Expected Fix Verification

After fix is applied:

  • Chained systemctl commands should either be rejected or timeout gracefully
  • EPERM during process termination should not crash the session
  • User should receive error message and maintain context

Additional Context

This crash occurred during active development work after successfully fixing a code bug. The loss of context required re-reading multiple files and re-establishing session state. The crash prevention rules were demonstrably insufficient despite being well-documented and recently read.

User Impact

  • Complete loss of session context
  • Need to re-establish project state
  • Interruption of active debugging workflow
  • Repeated pattern across multiple sessions (per user report)

What Should Happen?

Claude should either not execute the command or should not crash on EPERM.

Error Messages/Logs

Error: kill EPERM
    at process.kill (node:internal/process/per_thread:232:13)
    at ZwA (file:///usr/local/lib/node_modules/@anthropic-ai/claude-code/cli.js:195:2927)
    at file:///usr/local/lib/node_modules/@anthropic-ai/claude-code/cli.js:195:2800
    at Array.forEach (<anonymous>)

Steps to Reproduce

Execute a long command that exceeds two minutes in runtime. Example:

sudo systemctl restart && sleep 5 && sudo systemctl status | head -15 - where the restart takes sufficient time.

Claude Model

Sonnet (default)

Is this a regression?

No, this never worked

Last Working Version

No response

Claude Code Version

2.0.21 (Claude Code)

Platform

Anthropic API

Operating System

Ubuntu/Debian Linux

Terminal/Shell

Other

Additional Information

No response

joetomasone avatar Oct 17 '25 15:10 joetomasone