E2B icon indicating copy to clipboard operation
E2B copied to clipboard

Executing command or code can timeout when all CPU/RAM is used up

Open ValentaTomas opened this issue 2 years ago • 8 comments

When you execute a command or code that takes up all the CPU/RAM it can block communication and make the SDK request timeout because there is no response.

ValentaTomas avatar Mar 24 '24 21:03 ValentaTomas

With testing code loop:

import logging
import time

from dotenv import load_dotenv
from e2b import Sandbox

load_dotenv()

logging.basicConfig(level=logging.DEBUG)


def run_code(code, num):
    template_id = "code-interpreter"
    binary = "python3"
    filepath = "/index.py"

    print(f"\n\n------------(START - {num})-------------\n\n")
    with Sandbox(template=template_id, timeout=15) as sandbox:
        print(f"\n\n------------(SANDBOX - {sandbox.id})-------------\n\n")
        sandbox.filesystem.write(filepath, code, 15)

        # We use start and wait separately here instead of start_and_wait in order to define the timeout for the wait explicitly
        process = sandbox.process.start(cmd=f"{binary} {filepath}", timeout=30)
        try:
            process.wait(timeout=10)
        except:
            print("TimeoutError -> writing to /timeout")
            sandbox.filesystem.write("/timeout", "true", 15)
            raise
        print(f"\n\n------------(END - {num} - {sandbox.id})-------------\n\n")


for i in range(1000):
    run_code("str('')", i)
    time.sleep(1)

the error logs are:

------------(START - 36)-------------
INFO:e2b.sandbox.main:Creating sandbox code-interpreter
INFO:e2b.sandbox.sandbox_connection:Sandbox for template code-interpreter initialized
INFO:e2b.sandbox.main:Opening sandbox code-interpreter
INFO:e2b.sandbox.sandbox_connection:Sandbox fv7bfqp4wyo42829htzt created (id:ifjurtskbfr0jf8alwukk-40c5eb20)
INFO:e2b.sandbox.sandbox_connection:Started refreshing sandbox fv7bfqp4wyo42829htzt (id: ifjurtskbfr0jf8alwukk-40c5eb20)
DEBUG:asyncio:Using selector: KqueueSelector
INFO:e2b.sandbox.sandbox_rpc:WebSocket waiting to start
DEBUG:e2b.sandbox.websocket_client:WebSocket connecting to wss://49982-ifjurtskbfr0jf8alwukk-40c5eb20.e2b.dev/ws
INFO:e2b.sandbox.websocket_client:WebSocket connected to wss://49982-ifjurtskbfr0jf8alwukk-40c5eb20.e2b.dev/ws
DEBUG:e2b.sandbox.websocket_client:WebSocket starting to send messages
INFO:e2b.sandbox.sandbox_rpc:WebSocket started
INFO:e2b.sandbox.main:Sandbox code-interpreter opened
------------(SANDBOX - ifjurtskbfr0jf8alwukk-40c5eb20)-------------
DEBUG:e2b.sandbox.filesystem:Writing file /index.py
DEBUG:e2b.sandbox.sandbox_rpc:WebSocket queueing message: {"jsonrpc": "2.0", "method": "filesystem_write", "params": ["/index.py", "str('')"], "id": 1}
DEBUG:e2b.sandbox.sandbox_rpc:WebSocket waiting for reply: {"jsonrpc": "2.0", "method": "filesystem_write", "params": ["/index.py", "str('')"], "id": 1}
DEBUG:e2b.sandbox.websocket_client:WebSocket message to send: {"jsonrpc": "2.0", "method": "filesystem_write", "params": ["/index.py", "str('')"], "id": 1}
DEBUG:e2b.sandbox.websocket_client:WebSocket message sent: {"jsonrpc": "2.0", "method": "filesystem_write", "params": ["/index.py", "str('')"], "id": 1}
DEBUG:e2b.sandbox.websocket_client:WebSocket received message: {"jsonrpc":"2.0","id":1,"result":null}
DEBUG:e2b.sandbox.sandbox_rpc:WebSocket received message: {"jsonrpc":"2.0","id":1,"result":null}
DEBUG:e2b.sandbox.sandbox_rpc:Processing message: {"jsonrpc":"2.0","id":1,"result":null}
DEBUG:e2b.sandbox.sandbox_rpc:Current waiting handlers: [1]
DEBUG:e2b.sandbox.sandbox_rpc:WebSocket removed waiting handler for 1
DEBUG:e2b.sandbox.filesystem:Wrote file /index.py
INFO:e2b.sandbox.process:Starting process: python3 /index.py
DEBUG:e2b.sandbox.sandbox_rpc:WebSocket queueing message: {"jsonrpc": "2.0", "method": "process_subscribe", "params": ["onExit", "qXHQf12FhSaU"], "id": 2}
DEBUG:e2b.sandbox.sandbox_rpc:WebSocket queueing message: {"jsonrpc": "2.0", "method": "process_subscribe", "params": ["onStdout", "qXHQf12FhSaU"], "id": 3}
DEBUG:e2b.sandbox.sandbox_rpc:WebSocket waiting for reply: {"jsonrpc": "2.0", "method": "process_subscribe", "params": ["onExit", "qXHQf12FhSaU"], "id": 2}
DEBUG:e2b.sandbox.sandbox_rpc:WebSocket queueing message: {"jsonrpc": "2.0", "method": "process_subscribe", "params": ["onStderr", "qXHQf12FhSaU"], "id": 4}
DEBUG:e2b.sandbox.sandbox_rpc:WebSocket waiting for reply: {"jsonrpc": "2.0", "method": "process_subscribe", "params": ["onStdout", "qXHQf12FhSaU"], "id": 3}
DEBUG:e2b.sandbox.websocket_client:WebSocket message to send: {"jsonrpc": "2.0", "method": "process_subscribe", "params": ["onExit", "qXHQf12FhSaU"], "id": 2}
DEBUG:e2b.sandbox.sandbox_rpc:WebSocket waiting for reply: {"jsonrpc": "2.0", "method": "process_subscribe", "params": ["onStderr", "qXHQf12FhSaU"], "id": 4}
DEBUG:e2b.sandbox.websocket_client:WebSocket message sent: {"jsonrpc": "2.0", "method": "process_subscribe", "params": ["onExit", "qXHQf12FhSaU"], "id": 2}
DEBUG:e2b.sandbox.websocket_client:WebSocket message to send: {"jsonrpc": "2.0", "method": "process_subscribe", "params": ["onStdout", "qXHQf12FhSaU"], "id": 3}
DEBUG:e2b.sandbox.websocket_client:WebSocket message sent: {"jsonrpc": "2.0", "method": "process_subscribe", "params": ["onStdout", "qXHQf12FhSaU"], "id": 3}
DEBUG:e2b.sandbox.websocket_client:WebSocket message to send: {"jsonrpc": "2.0", "method": "process_subscribe", "params": ["onStderr", "qXHQf12FhSaU"], "id": 4}
DEBUG:e2b.sandbox.websocket_client:WebSocket message sent: {"jsonrpc": "2.0", "method": "process_subscribe", "params": ["onStderr", "qXHQf12FhSaU"], "id": 4}
DEBUG:e2b.sandbox.websocket_client:WebSocket received message: {"jsonrpc":"2.0","id":2,"result":"0xfa10253392e0945d8eeb40e6376ed84d"}
DEBUG:e2b.sandbox.websocket_client:WebSocket received message: {"jsonrpc":"2.0","id":3,"result":"0x5da45d078b642df698fb51e7a499cccf"}
DEBUG:e2b.sandbox.websocket_client:WebSocket received message: {"jsonrpc":"2.0","id":4,"result":"0x28ddaa165d77b226c3e4c2a9a1c9b9d"}
DEBUG:e2b.sandbox.sandbox_rpc:WebSocket received message: {"jsonrpc":"2.0","id":2,"result":"0xfa10253392e0945d8eeb40e6376ed84d"}
DEBUG:e2b.sandbox.sandbox_rpc:Processing message: {"jsonrpc":"2.0","id":2,"result":"0xfa10253392e0945d8eeb40e6376ed84d"}
DEBUG:e2b.sandbox.sandbox_rpc:Current waiting handlers: [2, 3, 4]
DEBUG:e2b.sandbox.sandbox_rpc:WebSocket received message: {"jsonrpc":"2.0","id":3,"result":"0x5da45d078b642df698fb51e7a499cccf"}
DEBUG:e2b.sandbox.sandbox_rpc:Processing message: {"jsonrpc":"2.0","id":3,"result":"0x5da45d078b642df698fb51e7a499cccf"}
DEBUG:e2b.sandbox.sandbox_rpc:Current waiting handlers: [2, 3, 4]
DEBUG:e2b.sandbox.sandbox_rpc:WebSocket received message: {"jsonrpc":"2.0","id":4,"result":"0x28ddaa165d77b226c3e4c2a9a1c9b9d"}
DEBUG:e2b.sandbox.sandbox_rpc:Processing message: {"jsonrpc":"2.0","id":4,"result":"0x28ddaa165d77b226c3e4c2a9a1c9b9d"}
DEBUG:e2b.sandbox.sandbox_rpc:Current waiting handlers: [2, 3, 4]
DEBUG:e2b.sandbox.sandbox_rpc:WebSocket removed waiting handler for 3
DEBUG:e2b.sandbox.sandbox_connection:Subscribed to process on_stdout (sub id: 0x5da45d078b642df698fb51e7a499cccf)
DEBUG:e2b.sandbox.sandbox_rpc:WebSocket removed waiting handler for 2
DEBUG:e2b.sandbox.sandbox_rpc:WebSocket removed waiting handler for 4
DEBUG:e2b.sandbox.sandbox_connection:Subscribed to process on_exit (sub id: 0xfa10253392e0945d8eeb40e6376ed84d)
DEBUG:e2b.sandbox.sandbox_connection:Subscribed to process on_stderr (sub id: 0x28ddaa165d77b226c3e4c2a9a1c9b9d)
DEBUG:e2b.sandbox.sandbox_rpc:WebSocket queueing message: {"jsonrpc": "2.0", "method": "process_start", "params": ["qXHQf12FhSaU", "python3 /index.py", {"PYTHONUNBUFFERED": "1"}, ""], "id": 5}
DEBUG:e2b.sandbox.sandbox_rpc:WebSocket waiting for reply: {"jsonrpc": "2.0", "method": "process_start", "params": ["qXHQf12FhSaU", "python3 /index.py", {"PYTHONUNBUFFERED": "1"}, ""], "id": 5}
DEBUG:e2b.sandbox.websocket_client:WebSocket message to send: {"jsonrpc": "2.0", "method": "process_start", "params": ["qXHQf12FhSaU", "python3 /index.py", {"PYTHONUNBUFFERED": "1"}, ""], "id": 5}
DEBUG:e2b.sandbox.websocket_client:WebSocket message sent: {"jsonrpc": "2.0", "method": "process_start", "params": ["qXHQf12FhSaU", "python3 /index.py", {"PYTHONUNBUFFERED": "1"}, ""], "id": 5}
DEBUG:e2b.sandbox.websocket_client:WebSocket received message: {"jsonrpc":"2.0","id":5,"result":"qXHQf12FhSaU"}
DEBUG:e2b.sandbox.sandbox_rpc:WebSocket received message: {"jsonrpc":"2.0","id":5,"result":"qXHQf12FhSaU"}
DEBUG:e2b.sandbox.sandbox_rpc:Processing message: {"jsonrpc":"2.0","id":5,"result":"qXHQf12FhSaU"}
DEBUG:e2b.sandbox.sandbox_rpc:Current waiting handlers: [5]
DEBUG:e2b.sandbox.sandbox_rpc:WebSocket removed waiting handler for 5
INFO:e2b.sandbox.process:Started process (id: qXHQf12FhSaU)
DEBUG:e2b.sandbox.sandbox_connection:Refreshed sandbox isc5zlsuh9yylauaey00w-40c5eb20
DEBUG:e2b.sandbox.sandbox_connection:Cannot refresh sandbox - it was closed. isc5zlsuh9yylauaey00w-40c5eb20
INFO:e2b.sandbox.sandbox_connection:Stopped refreshing sandbox (id: isc5zlsuh9yylauaey00w-40c5eb20)
DEBUG:e2b.sandbox.sandbox_connection:Refreshed sandbox ifjurtskbfr0jf8alwukk-40c5eb20
DEBUG:e2b.sandbox.sandbox_connection:Refreshed sandbox ifjurtskbfr0jf8alwukk-40c5eb20
TimeoutError -> writing to /timeout
DEBUG:e2b.sandbox.filesystem:Writing file /timeout
DEBUG:e2b.sandbox.sandbox_rpc:WebSocket queueing message: {"jsonrpc": "2.0", "method": "filesystem_write", "params": ["/timeout", "true"], "id": 6}
DEBUG:e2b.sandbox.sandbox_rpc:WebSocket waiting for reply: {"jsonrpc": "2.0", "method": "filesystem_write", "params": ["/timeout", "true"], "id": 6}
DEBUG:e2b.sandbox.websocket_client:WebSocket message to send: {"jsonrpc": "2.0", "method": "filesystem_write", "params": ["/timeout", "true"], "id": 6}
DEBUG:e2b.sandbox.websocket_client:WebSocket message sent: {"jsonrpc": "2.0", "method": "filesystem_write", "params": ["/timeout", "true"], "id": 6}
DEBUG:e2b.sandbox.sandbox_connection:Refreshed sandbox ifjurtskbfr0jf8alwukk-40c5eb20
DEBUG:e2b.sandbox.sandbox_connection:Refreshed sandbox ifjurtskbfr0jf8alwukk-40c5eb20
ERROR:e2b.sandbox.sandbox_rpc:WebSocket timed out while waiting for: {"jsonrpc": "2.0", "method": "filesystem_write", "params": ["/timeout", "true"], "id": 6} 
ERROR:e2b.sandbox.sandbox_rpc:WebSocket received error while waiting for: {"jsonrpc": "2.0", "method": "filesystem_write", "params": ["/timeout", "true"], "id": 6} WebSocket timed out while waiting for: {"jsonrpc": "2.0", "method": "filesystem_write", "params": ["/timeout", "true"], "id": 6} 
DEBUG:e2b.sandbox.sandbox_rpc:WebSocket removed waiting handler for 6
INFO:e2b.sandbox.sandbox_connection:Closing sandbox fv7bfqp4wyo42829htzt (id: ifjurtskbfr0jf8alwukk-40c5eb20)
INFO:e2b.sandbox.sandbox_connection:Sandbox closed
Traceback (most recent call last):
  File "/Users/valenta.and.thomas/Developer/e2b/packages/python-sdk/example.py", line 25, in run_code
INFO:e2b.sandbox.websocket_client:WebSocket stopped
    process.wait(timeout=10)
  File "/Users/valenta.and.thomas/Developer/e2b/packages/python-sdk/e2b/sandbox/process.py", line 184, in wait
    raise TimeoutException(f"Process did not finish within {timeout} seconds")
e2b.sandbox.exception.TimeoutException: Process did not finish within 10 seconds
During handling of the above exception, another exception occurred:
Traceback (most recent call last):
  File "/Users/valenta.and.thomas/Developer/e2b/packages/python-sdk/e2b/sandbox/sandbox_rpc.py", line 136, in send_message
    r = future_reply.result(timeout=timeout)
        ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
  File "/Users/valenta.and.thomas/Developer/e2b/packages/python-sdk/e2b/utils/future.py", line 23, in result
    return self._future.result(timeout=timeout)
           ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
  File "/opt/homebrew/Cellar/[email protected]/3.11.6_1/Frameworks/Python.framework/Versions/3.11/lib/python3.11/concurrent/futures/_base.py", line 458, in result
    raise TimeoutError()
TimeoutError
During handling of the above exception, another exception occurred:
Traceback (most recent call last):
  File "/Users/valenta.and.thomas/Developer/e2b/packages/python-sdk/example.py", line 34, in <module>
    run_code("str('')", i)
  File "/Users/valenta.and.thomas/Developer/e2b/packages/python-sdk/example.py", line 28, in run_code
    sandbox.filesystem.write("/timeout", "true", 15)
  File "/Users/valenta.and.thomas/Developer/e2b/packages/python-sdk/e2b/sandbox/filesystem.py", line 111, in write
    self._sandbox._call(
  File "/Users/valenta.and.thomas/Developer/e2b/packages/python-sdk/e2b/sandbox/sandbox_connection.py", line 314, in _call
    return self._rpc.send_message(f"{service}_{method}", params, timeout)
           ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
  File "/Users/valenta.and.thomas/Developer/e2b/packages/python-sdk/e2b/sandbox/sandbox_rpc.py", line 145, in send_message
    raise e
  File "/Users/valenta.and.thomas/Developer/e2b/packages/python-sdk/e2b/sandbox/sandbox_rpc.py", line 139, in send_message
    raise TimeoutException(
e2b.sandbox.exception.TimeoutException: WebSocket timed out while waiting for: {"jsonrpc": "2.0", "method": "filesystem_write", "params": ["/timeout", "true"], "id": 6} 

All Grafana logs for this instance are:

2024-03-22 16:15:15.136	
processSvc
Started new process
2024-03-22 16:15:15.124	
processSvc
Process with ID doesn't exist yet. Creating a new process
2024-03-22 16:15:15.117	
processSvc
Start process
2024-03-22 16:15:15.065	
processSvc
Subscribed to process stderr
2024-03-22 16:15:15.060	
processSvc
Subscribed to process stdout
2024-03-22 16:15:15.054	
processSvc
Subscribed to process exit
2024-03-22 16:15:15.048	
processSvc
Subscribing to process stderr
2024-03-22 16:15:15.043	
processSvc
Subscribing to process exit
2024-03-22 16:15:15.036	
processSvc
Subscribing to process stdout



2024-03-22 16:15:14.909	
filesystemSvc
Write file

ValentaTomas avatar Mar 24 '24 21:03 ValentaTomas

After looking at the looks it looks like the whole sandbox becomes unresponsive after the process starts — the second filesystem write is not executed when we try to do that after catching the timeout exception.

ValentaTomas avatar Apr 02 '24 01:04 ValentaTomas

I want to test this in JS too, to check if this is not some problem with WS/connection implementation.

ValentaTomas avatar Apr 02 '24 01:04 ValentaTomas

I also get timeout in JS. I set timeouts to 5min but it timeouts after 60sec. This happens with any startAndWait call.

TimeoutError: Promise timed out after 60000 milliseconds.

jsuhncc3d avatar Apr 05 '24 06:04 jsuhncc3d

Error

ERROR:e2b.sandbox.websocket_client:WebSocket received error while receiving messages: no close frame received or sent

Is probably related — when the sandbox stops responding during the sending of message the message becomes mangled.

ValentaTomas avatar Apr 10 '24 22:04 ValentaTomas

I'm re-opening this issue because it's still happening to our users. It's most likely connected to a sandbox running out of memory.

This issue should stay open until

  • [ ] We have a better and friendlier error message mentioning OOM
  • [ ] We have good CPU + RAM usage monitoring for our users

mlejva avatar Apr 21 '24 21:04 mlejva

We should also improve the OOM handling by exploring the following:

  • zram
  • zswap
  • swap
  • uresourced
  • systemd-oomd
  • changing oom_score_adj for envd, start command and for the processes started via envd (the score is by default inherited)

ValentaTomas avatar Apr 23 '24 20:04 ValentaTomas

This issue is not specific to websockets, but to the management of resource in the sandbox.

ValentaTomas avatar Jul 23 '24 09:07 ValentaTomas