open-interpreter icon indicating copy to clipboard operation
open-interpreter copied to clipboard

Security & Performance Issue: Unlimited Code Output in LLM Context

Open ovenzeze opened this issue 1 year ago • 1 comments

Issue Description

When executing code that produces large amounts of output (e.g., directory listings, file contents, system information), all output is sent to the LLM in its entirety before being truncated in the final response. This raises both security and performance concerns:

  1. Security Risk:

    • Sensitive information in large outputs (logs, system info, file contents) is sent to the LLM
    • Even if truncated in the final response, the LLM has already processed the complete output
    • This could lead to unintended data exposure
  2. Performance Impact:

    • Unnecessary token consumption when sending large outputs to the LLM
    • Increased API costs
    • Potential context window overflow

Example

# Simple code that generates large output
import os
for root, dirs, files in os.walk("/"):
    print(f"Directory: {root}")
    for file in files:
        print(f"  File: {file}")

Current behavior:

  1. Code executes and generates complete output
  2. Complete output is sent to LLM
  3. LLM processes all output
  4. Response is truncated for display

Proposed Solution

Add output limiting at the source (code execution) level:

  1. Add a configurable max_output_lines or max_output_bytes parameter
  2. Implement truncation during code execution, before sending to LLM
  3. Add clear indicators when output is truncated

This aligns with the project's philosophy of simplicity and security while maintaining core functionality.

Questions

  1. Would this feature align with the project's scope?
  2. Should this be configurable per execution or as a global setting?
  3. What would be a reasonable default limit?

Additional Context

This issue was discovered while building a service using Open Interpreter's API. The complete output being sent to the LLM was noticed through debug logs and token usage metrics.

Describe the solution you'd like

Add output limiting at the source (code execution) level:

  1. Add a configurable max_output_lines or max_output_bytes parameter
  2. Implement truncation during code execution, before sending to LLM
  3. Add clear indicators when output is truncated

This aligns with the project's philosophy of simplicity and security while maintaining core functionality.

Describe alternatives you've considered

No response

Additional context

No response

ovenzeze avatar Feb 10 '25 22:02 ovenzeze

Try this setting: interpreter --max_output 1000 https://docs.openinterpreter.com/settings/all-settings#max-output

It looks like by default this value is set to 2800. It should do what you're describing, truncating output from code execution.

CyanideByte avatar Feb 14 '25 13:02 CyanideByte