[Bug] High idle resource usage: orphaned Playwright processes and runaway ipython/pytest
[Bug] High idle resource usage: orphaned Playwright processes and runaway ipython/pytest
Reported: 2025-09-23 03:55:37
Summary In an Agent Zero container, unusually high CPU and RAM usage was observed while idle. Investigation found:
- A runaway ipython/pytest process consuming ~100% CPU and up to ~20 GiB RAM.
- Dozens of orphaned Playwright headless_shell processes and multiple run-driver (node) processes.
Cleaning up the runaway process and orphaned Playwright processes immediately freed ~27 GiB memory and reduced system load significantly.
Environment
- Platform: Docker container (Debian/Kali base)
- Agent Zero environment: /a0 (browser_agent using Playwright)
- Services intentionally kept running: /a0/run_ui.py, /a0/run_tunnel.py
- Version: M v0.9.5-1 25-09-03 08:55
Symptoms
- High CPU and RAM usage without active user tasks
- Many headless_shell and Playwright driver processes with no apparent controller
- A few defunct (zombie) node processes (no resource impact but indicate incomplete cleanup)
Reproduction (likely scenarios)
- Tests or browser_agent tasks end with exceptions or are terminated before calling page/context/browser close; Playwright child processes are left orphaned.
- Long-lived browsers without TTL/idle-timeout increase risk of process/memory leaks.
Expected behavior
- All pages, contexts, and browsers are closed at task end (success or failure)
- No orphan headless_shell or run-driver processes
- Low idle resource usage
Actual behavior
- Orphan Playwright headless_shell and run-driver processes remained
- A runaway ipython/pytest process consumed ~100% CPU and tens of GiB RAM
Diagnostics (excerpt)
((venv) ) root@786f2f1961e7:/# ((venv) ) root@786f2f1961e7:/#
===== Etsitään ja sammutetaan Playwright headless_shell -prosessit =====
((venv) ) root@786f2f1961e7:/# ((venv) ) root@786f2f1961e7:/# > > > > > > > Löytyi 90 headless_shell-prosessia: 3837 3841 3842 3858 3859 3876 10480 10484 10485 10502 10505 10508 35284 35292 35293 35308 35329 35346 41990 41992 41993 42012 42013 42022 43849 43851 43852 43869 43870 43883 46868 46870 46871 46888 46889 46906 53102 53106 53107 53122 53123 53142 61588 61590 61591 61608 61609 61626 66878 66880 66881 66896 66897 66914 67365 67367 67368 67383 67384 67402 71045 71047 71048 71065 71066 71072 74950 74952 74953 74970 74971 74986 76443 76445 76446 76461 76463 76468 87824 87826 87827 87846 87847 87851 96499 96501 96502 96519 96520 96538
((venv) ) root@786f2f1961e7:/# ((venv) ) root@786f2f1961e7:/#
===== Etsitään ja sammutetaan Playwright driver (node) -prosessit =====
((venv) ) root@786f2f1961e7:/# ((venv) ) root@786f2f1961e7:/# > > > > > > > Löytyi 15 Playwright driver -prosessia: 3810 10453 35238 41963 43820 46839 53075 61559 66867 67354 71014 74937 76414 87795 96472
((venv) ) root@786f2f1961e7:/# ((venv) ) root@786f2f1961e7:/#
===== Jätetään käyntiin UI ja tunneli (run_ui.py, run_tunnel.py) =====
((venv) ) root@786f2f1961e7:/# 28 python /a0/run_tunnel.py --dockerized=true --port=80 --tunnel_api_port=55520 --host=0.0.0.0 --code_exec_docker_enabled=false --code_exec_ssh_enabled=true
29 python /a0/run_ui.py --dockerized=true --port=80 --host=0.0.0.0
((venv) ) root@786f2f1961e7:/# ((venv) ) root@786f2f1961e7:/#
===== Tilanne siivouksen jälkeen =====
((venv) ) root@786f2f1961e7:/# Tue Sep 23 03:48:28 AM UTC 2025
03:48:28 up 23:03, 0 users, load average: 1.38, 1.22, 1.12
((venv) ) root@786f2f1961e7:/#
-- FREE --
total used free shared buff/cache available
Mem: 31Gi 2.3Gi 27Gi 5.3Mi 1.7Gi 29Gi
Swap: 8.0Gi 3.3Gi 4.7Gi
((venv) ) root@786f2f1961e7:/#
-- CPU TOP (ps) --
((venv) ) root@786f2f1961e7:/# PID USER COMMAND %CPU %MEM ELAPSED COMMAND
29 root pt_main_thread 5.7 3.0 14:03:55 python /a0/run_ui.py --dockerized=true --port=80 --host=0.0.0.0
67354 root node 0.1 0.0 12:29:58 [node]
87795 root node 0.1 0.0 13:31:23 [node]
46839 root node 0.1 0.0 12:38:59 [node]
61559 root node 0.1 0.0 12:33:58 [node]
3810 root node 0.1 0.0 13:25:33 [node]
41963 root node 0.1 0.0 13:46:10 [node]
43820 root node 0.1 0.0 13:44:55 [node]
76414 root node 0.1 0.0 13:34:54 [node]
66867 root node 0.1 0.0 12:31:24 [node]
10453 root node 0.1 0.0 13:23:14 [node]
35238 root node 0.1 0.0 13:48:40 [node]
71014 root node 0.1 0.0 13:36:29 [node]
96472 root node 0.1 0.0 13:28:14 [node]
((venv) ) root@786f2f1961e7:/#
-- MEM TOP (ps) --
((venv) ) root@786f2f1961e7:/# PID USER COMMAND %CPU %MEM ELAPSED COMMAND
29 root pt_main_thread 5.7 3.0 14:03:55 python /a0/run_ui.py --dockerized=true --port=80 --host=0.0.0.0
1 root supervisord 0.0 0.0 14:03:56 /usr/bin/python3 /usr/bin/supervisord -c /etc/supervisor/conf.d/supervisord.conf
93067 root bash 0.0 0.0 06:40:38 /bin/bash
4969 root ps 0.0 0.0 00:00 ps -eo pid,user,comm:24,pcpu,pmem,etime,args --sort=-pmem
69628 root bash 0.0 0.0 12:22:28 /bin/bash
69045 root bash 0.0 0.0 12:23:58 /bin/bash
4970 root head 0.0 0.0 00:00 head -n 15
28 root pt_main_thread 0.0 0.0 14:03:55 python /a0/run_tunnel.py --dockerized=true --port=80 --tunnel_api_port=55520 --host=0.0.0.0 --code_exec_docker_enabled=false --code_exec_ssh_enabled=true
26 searxng Thread-41 (proc 0.0 0.0 14:03:55 python /usr/local/searxng/searxng-src/searx/webapp.py
25 root cron 0.0 0.0 14:03:55 /usr/sbin/cron -f
39102 root bash 0.0 0.0 13:47:06 /bin/bash
24 root python3 0.0 0.0 14:03:55 python3 /exe/supervisor_event_listener.py
27 root sshd 0.0 0.0 14:03:55 sshd: /usr/sbin/sshd -D [listener] 0 of 10-100 startups
43493 root bash 0.0 0.0 13:45:01 /bin/bash
((venv) ) root@786f2f1961e7:/# ((venv) ) root@786f2f1961e7:/# ((venv) ) root@786f2f1961e7:/# ((venv) ) root@786f2f1961e7:/# ((venv) ) root@786f2f1961e7:/# ((venv) ) root@786f2f1961e7:/#
===== Tarkistetaan ipython/pytest (PID 72662) ja tapetaan kovalla kädellä, jos elossa =====
((venv) ) root@786f2f1961e7:/# > > > > PID 72662 ei ole elossa
((venv) ) root@786f2f1961e7:/# ((venv) ) root@786f2f1961e7:/#
===== Tapetaan Playwright headless_shell -prosessit (KILL) =====
((venv) ) root@786f2f1961e7:/# ((venv) ) root@786f2f1961e7:/# ((venv) ) root@786f2f1961e7:/#
===== Tapetaan Playwright driver (node) -prosessit (KILL) =====
((venv) ) root@786f2f1961e7:/# ((venv) ) root@786f2f1961e7:/# ((venv) ) root@786f2f1961e7:/#
===== Jätetään käyntiin UI ja tunneli (run_ui.py, run_tunnel.py) =====
((venv) ) root@786f2f1961e7:/# 28 python /a0/run_tunnel.py --dockerized=true --port=80 --tunnel_api_port=55520 --host=0.0.0.0 --code_exec_docker_enabled=false --code_exec_ssh_enabled=true
29 python /a0/run_ui.py --dockerized=true --port=80 --host=0.0.0.0
((venv) ) root@786f2f1961e7:/# ((venv) ) root@786f2f1961e7:/#
===== Jäljellä olevat Playwright-prosessit =====
((venv) ) root@786f2f1961e7:/# Ei jäljellä
((venv) ) root@786f2f1961e7:/# ((venv) ) root@786f2f1961e7:/#
===== Tilanne siivouksen jälkeen =====
((venv) ) root@786f2f1961e7:/# Tue Sep 23 03:49:12 AM UTC 2025
03:49:12 up 23:03, 0 users, load average: 0.81, 1.10, 1.08
((venv) ) root@786f2f1961e7:/#
-- FREE --
total used free shared buff/cache available
Mem: 31Gi 2.3Gi 27Gi 5.3Mi 1.7Gi 28Gi
Swap: 8.0Gi 3.2Gi 4.8Gi
((venv) ) root@786f2f1961e7:/#
-- CPU TOP (ps) --
((venv) ) root@786f2f1961e7:/# PID USER COMMAND %CPU %MEM ELAPSED COMMAND
29 root pt_main_thread 5.7 3.1 14:04:38 python /a0/run_ui.py --dockerized=true --port=80 --host=0.0.0.0
67354 root node 0.1 0.0 12:30:42 [node]
87795 root node 0.1 0.0 13:32:07 [node]
46839 root node 0.1 0.0 12:39:43 [node]
61559 root node 0.1 0.0 12:34:42 [node]
3810 root node 0.1 0.0 13:26:17 [node]
41963 root node 0.1 0.0 13:46:54 [node]
43820 root node 0.1 0.0 13:45:39 [node]
76414 root node 0.1 0.0 13:35:37 [node]
10453 root node 0.1 0.0 13:23:57 [node]
66867 root node 0.1 0.0 12:32:07 [node]
((venv) ) root@786f2f1961e7:/#
-- MEM TOP (ps) --
((venv) ) root@786f2f1961e7:/# PID USER COMMAND %CPU %MEM ELAPSED COMMAND
29 root pt_main_thread 5.7 3.1 14:04:38 python /a0/run_ui.py --dockerized=true --port=80 --host=0.0.0.0
1 root supervisord 0.0 0.0 14:04:40 /usr/bin/python3 /usr/bin/supervisord -c /etc/supervisor/conf.d/supervisord.conf
93067 root bash 0.0 0.0 06:41:22 /bin/bash
5425 root ps 0.0 0.0 00:00 ps -eo pid,user,comm:24,pcpu,pmem,etime,args --sort=-pmem
69628 root bash 0.0 0.0 12:23:12 /bin/bash
69045 root bash 0.0 0.0 12:24:42 /bin/bash
5426 root head 0.0 0.0 00:00 head -n 12
28 root pt_main_thread 0.0 0.0 14:04:38 python /a0/run_tunnel.py --dockerized=true --port=80 --tunnel_api_port=55520 --host=0.0.0.0 --code_exec_docker_enabled=false --code_exec_ssh_enabled=true
26 searxng Thread-41 (proc 0.0 0.0 14:04:38 python /usr/local/searxng/searxng-src/searx/webapp.py
25 root cron 0.0 0.0 14:04:38 /usr/sbin/cron -f
39102 root bash 0.0 0.0 13:47:50 /bin/bash
((venv) ) root@786f2f1961e7:/# ((venv) ) root@786f2f1961e7:/#
Reference triage commands
date && uptime
free -h
ps -eo pid,user,comm:24,pcpu,pmem,etime,args --sort=-pcpu | head -n 25
ps -eo pid,user,comm:24,pcpu,pmem,etime,args --sort=-pmem | head -n 25
pgrep -af python || true; pgrep -af node || true; pgrep -af 'chrome|chromium|playwright' || true
ss -tulpn | sed -n '1,40p' # if available
Hotfix applied
- Terminated the runaway ipython/pytest process (SIGTERM, then SIGKILL if needed)
- Terminated all Playwright headless_shell processes and run-driver processes (SIGTERM→SIGKILL)
- Kept only UI and tunnel processes running
- Result: ~27 GiB memory freed; system load dropped; a few harmless zombies remained
Proposed fixes (permanent)
- Robust browser_agent teardown across all code paths
- Ensure try/finally defensively calls: await page.close(), await context.close(), await browser.close()
- Ensure termination also on error paths and interrupts
- TTL / idle-timeout for browsers and contexts
- Add configurable TTL (e.g., 5–15 min) per browser/context; if no activity, auto-close
- Provide sane defaults in browser_agent settings; allow per-task override
- Orphan cleanup as a fallback
- Lightweight cleanup on task/agent completion to kill orphan headless/driver processes (scoped patterns, PID tree)
- Optional system-level cleanup (cron/systemd timer)
- Provide optional script & docs; example:
#!/usr/bin/env bash
set -euo pipefail
pkill -TERM -f '/playwright/.*/headless_shell' || true
sleep 1
pkill -KILL -f '/playwright/.*/headless_shell' || true
pkill -TERM -f 'playwright/driver/package/cli.js run-driver' || true
sleep 1
pkill -KILL -f 'playwright/driver/package/cli.js run-driver' || true
Cron example:
*/30 * * * * /usr/local/bin/agent_cleanup.sh >/var/log/agent_cleanup.log 2>&1
- Tests and acceptance criteria
- Fault injection: crash test harness before browser.close() and assert no orphans remain > 60s (debatable if this should be several minutes instead depends on task the (sub_)agent doing)
- Browser_agent unit/integration: verify teardown calls close() chain across all code paths
- Acceptance: no orphan headless_shell/run-driver processes > 60s after task end; idle usage returns to baseline (debatable if this should be several minutes instead depends on task the (sub_)agent doing)
Impact
- High resource usage, potential OOM risk, degraded performance for other tasks running in the container
Additional notes
- Zombie processes (defunct) do not consume CPU/RAM but indicate missing wait/cleanup by parent
- Priority: deterministic teardown + TTL; system-level cleanup acts as a safety net
Next step
- requesting PR to:
- Strengthen teardown in browser_agent
- Add configurable TTL
- Add optional agent_cleanup.sh + documentation (cron/systemd timer)
- Add tests & a fault-injection scenario preventing orphans
Diagnostics included from /a0/tmp/chats/RexEolrU/messages/117.txt
((venv) ) root@786f2f1961e7:/# ((venv) ) root@786f2f1961e7:/#
===== Etsitään ja sammutetaan Playwright headless_shell -prosessit =====
((venv) ) root@786f2f1961e7:/# ((venv) ) root@786f2f1961e7:/# > > > > > > > Löytyi 90 headless_shell-prosessia: 3837 3841 3842 3858 3859 3876 10480 10484 10485 10502 10505 10508 35284 35292 35293 35308 35329 35346 41990 41992 41993 42012 42013 42022 43849 43851 43852 43869 43870 43883 46868 46870 46871 46888 46889 46906 53102 53106 53107 53122 53123 53142 61588 61590 61591 61608 61609 61626 66878 66880 66881 66896 66897 66914 67365 67367 67368 67383 67384 67402 71045 71047 71048 71065 71066 71072 74950 74952 74953 74970 74971 74986 76443 76445 76446 76461 76463 76468 87824 87826 87827 87846 87847 87851 96499 96501 96502 96519 96520 96538
((venv) ) root@786f2f1961e7:/# ((venv) ) root@786f2f1961e7:/#
===== Etsitään ja sammutetaan Playwright driver (node) -prosessit =====
((venv) ) root@786f2f1961e7:/# ((venv) ) root@786f2f1961e7:/# > > > > > > > Löytyi 15 Playwright driver -prosessia: 3810 10453 35238 41963 43820 46839 53075 61559 66867 67354 71014 74937 76414 87795 96472
((venv) ) root@786f2f1961e7:/# ((venv) ) root@786f2f1961e7:/#
===== Jätetään käyntiin UI ja tunneli (run_ui.py, run_tunnel.py) =====
((venv) ) root@786f2f1961e7:/# 28 python /a0/run_tunnel.py --dockerized=true --port=80 --tunnel_api_port=55520 --host=0.0.0.0 --code_exec_docker_enabled=false --code_exec_ssh_enabled=true
29 python /a0/run_ui.py --dockerized=true --port=80 --host=0.0.0.0
((venv) ) root@786f2f1961e7:/# ((venv) ) root@786f2f1961e7:/#
===== Tilanne siivouksen jälkeen =====
((venv) ) root@786f2f1961e7:/# Tue Sep 23 03:48:28 AM UTC 2025
03:48:28 up 23:03, 0 users, load average: 1.38, 1.22, 1.12
((venv) ) root@786f2f1961e7:/#
-- FREE --
total used free shared buff/cache available
Mem: 31Gi 2.3Gi 27Gi 5.3Mi 1.7Gi 29Gi
Swap: 8.0Gi 3.3Gi 4.7Gi
((venv) ) root@786f2f1961e7:/#
-- CPU TOP (ps) --
((venv) ) root@786f2f1961e7:/# PID USER COMMAND %CPU %MEM ELAPSED COMMAND
29 root pt_main_thread 5.7 3.0 14:03:55 python /a0/run_ui.py --dockerized=true --port=80 --host=0.0.0.0
67354 root node 0.1 0.0 12:29:58 [node]