[BUG] EPERM on long commands still crashing Claude Code

Open joetomasone opened this issue 3 months ago • 5 comments

Preflight Checklist

[x] I have searched existing issues and this hasn't been reported yet
[x] This is a single bug report (please file separate reports for different bugs)
[x] I am using the latest version of Claude Code

What's Wrong?

My understanding is that long commands are supposed to be running in the background now, but either this functionality is not working or it failed to work in this and other cases.

Claude Code Crash Report - 2025-10-17

Summary

Claude Code crashed with Error: kill EPERM when executing a chained systemctl command, despite having crash prevention rules in context.

Environment

Claude Code Version: 2.0.21
System: Ubuntu 24.04.2 LTS
Date/Time: 2025-10-17 ~14:45 UTC
Session Context Usage: ~4% remaining before crash

Crash Details

Exact Command Executed

sudo systemctl restart storms-monitor.service && sleep 5 && sudo systemctl status storms-monitor.service | head -15

Error Message

Error: kill EPERM
    at process.kill (node:internal/process/per_thread:232:13)
    at ZwA (file:///usr/local/lib/node_modules/@anthropic-ai/claude-code/cli.js:195:2927)
    at file:///usr/local/lib/node_modules/@anthropic-ai/claude-code/cli.js:195:2800
    at Array.forEach (<anonymous>)

Tool Call Details

Tool: Bash
Description: "Restart storms-monitor service and check status"
Timeout: Default 120000ms (2 minutes)
Actual Duration: Exceeded timeout during command chain execution

Context

Rules Known to Claude

The following rules were read at session startup from /opt/CLAUDE.md:

## 🚨 STOP! CRASH PREVENTION WARNING 🚨

**THESE COMMANDS WILL 100% CRASH CLAUDE CLI:**
```bash
# ❌ ANY command with -f (follow) flag
journalctl -f              # CRASH!
tail -f                    # CRASH!

# ❌ ANY piped command that runs long
python script.py | head    # CRASH!

# ❌ ANY Python script > 2 minutes
sudo -u flask venv/bin/python long_script.py  # CRASH!

# ❌ Flask development servers (wsgi.py)
sudo -u flask venv/bin/python wsgi.py  # CRASH!

USE SAFE ALTERNATIVES:

journalctl --since "1 minute ago" (no -f flag)
Write to file first: python script.py > out.txt && cat out.txt
For long scripts: "This will timeout. Please run: [command]"


### What Happened
1. Claude made a code fix to a python file
2. Wanted to verify the fix worked immediately
4. Executed chained systemctl command to restart + verify in one step
5. Command exceeded 2-minute timeout threshold
6. Claude Code attempted to kill the process → EPERM error
7. Session crashed, all context lost
### Expected Behavior
**Option 1**: Command should have been rejected with a warning about dangerous pattern
**Option 2**: Timeout should gracefully terminate without crashing the entire session
**Option 3**: EPERM should be caught and handled without session termination

### Actual Behavior
Session crashed with EPERM, all context and conversation history lost.

## Root Cause Analysis

### Immediate Cause
Chaining `systemctl restart` (which can take time) with `sleep 5` and `status | head -15` exceeded the 2-minute Bash timeout, triggering an EPERM error during process termination.

### Deeper Issue
Despite having explicit crash prevention rules in context:
1. Claude had full knowledge of the dangerous patterns
2. Rules were read 5-10 minutes prior to the crash
3. The specific pattern used was covered in the rules
4. Cognitive bias toward "verify immediately" overpowered rule-checking

This suggests **documentation-based prevention is insufficient** - dangerous patterns need **architectural enforcement** at the tool execution layer.

## Pattern Recognition Issues

### This Pattern Recurs
From `/opt/CLAUDE.md`:
```markdown
### 7. **⚠️ LITERAL VALUE CORRUPTION** - CRITICAL DATA INTEGRITY BUG (GitHub #3004)

The crash prevention rules are documented similarly to other critical rules, but compliance is inconsistent across sessions. This is not an isolated incident.

Medium Priority (Feature Requests)

Command Validation Layer: Optional pre-execution validation against known dangerous patterns
Safety Mode Toggle: Allow users to enable strict mode that rejects dangerous patterns
Timeout Warnings: Warn when commands approach timeout threshold instead of hard failure

Low Priority (Documentation)

Document that rules in project files may not reliably affect LLM behavior
Recommend architectural enforcement over documentation-based prevention
Add crash recovery guidance for users

Reproduction Steps

Have crash prevention rules in CLAUDE.md
Read CLAUDE.md at session startup

Execute any chained systemctl command that might take >2 minutes:

sudo systemctl restart service.service && sleep 5 && sudo systemctl status service.service

Wait for timeout (120 seconds)
Observe EPERM crash

Expected Fix Verification

After fix is applied:

Chained systemctl commands should either be rejected or timeout gracefully
EPERM during process termination should not crash the session
User should receive error message and maintain context

Additional Context

This crash occurred during active development work after successfully fixing a code bug. The loss of context required re-reading multiple files and re-establishing session state. The crash prevention rules were demonstrably insufficient despite being well-documented and recently read.

User Impact

Complete loss of session context
Need to re-establish project state
Interruption of active debugging workflow
Repeated pattern across multiple sessions (per user report)

What Should Happen?

Claude should either not execute the command or should not crash on EPERM.

Error Messages/Logs

Error: kill EPERM
    at process.kill (node:internal/process/per_thread:232:13)
    at ZwA (file:///usr/local/lib/node_modules/@anthropic-ai/claude-code/cli.js:195:2927)
    at file:///usr/local/lib/node_modules/@anthropic-ai/claude-code/cli.js:195:2800
    at Array.forEach (<anonymous>)

Steps to Reproduce

Execute a long command that exceeds two minutes in runtime. Example:

sudo systemctl restart && sleep 5 && sudo systemctl status | head -15 - where the restart takes sufficient time.

Claude Model

Sonnet (default)

Is this a regression?

No, this never worked

Last Working Version

No response

Claude Code Version

2.0.21 (Claude Code)

Platform

Anthropic API

Operating System

Ubuntu/Debian Linux

Terminal/Shell

Other

Additional Information

No response

Oct 17 '25 15:10 joetomasone