browser-use
browser-use copied to clipboard
Can't input text into Expedia support chat window
Bug Description
I tried to get the agent to modify an Expedia booking for me by talking to the Expedia support chat but the agent has trouble locating and properly inputting text into the chat window.
Reproduction Steps
- Ask the agent to modify an Expedia booking
Code Sample
I'm using web-ui
Version
0.1.37
LLM Model
GPT-4o
Operating System
macOS 14.3
Relevant Log Output
ERROR [agent] ❌ Result failed 1/3 times:
Error executing action input_text: Failed to input text into index 33
INFO [src.agent.custom_agent]
📍 Step 15
INFO [src.agent.custom_agent] ❌ Eval: Failed - Unable to input the text into the chat window with Navneet.
INFO [src.agent.custom_agent] 🧠 New Memory: Navneet is engaged and processing the request regarding flight changes.
INFO [src.agent.custom_agent] ⏳ Task Progress:
1. Open Expedia website. 2. Navigate to Support section.
INFO [src.agent.custom_agent] 📋 Future Plans:
1. Locate the chat input field again. 2. Successfully input text to confirm the return date change.
INFO [src.agent.custom_agent] 🤔 Thought: The previous attempt to input text failed, so I need to ensure the correct target for input.
INFO [src.agent.custom_agent] 🎯 Summary: Locate the chat input field and confirm the return date change to Feb 28.
INFO [src.agent.custom_agent] 🛠️ Action 1/1: {"scroll_down":{"amount":483}}
INFO [src.agent.custom_agent] 🧠 All Memory:
Chat with agent Navneet is actively in progress.
Navneet is engaged and has responded, asking for confirmation on the reservation details.
Navneet is waiting for confirmation on the reservation details.
Navneet has asked for further confirmation about the changes.
Navneet is processing the request to change United flights from DEN to LAX and cancel XXX's booking.
Chat with agent Navneet is actively in progress. Navneet is engaged and processing the request to change United flights from DEN to LAX and cancel XXX's booking.
Navneet is reviewing the flight and requesting confirmation to proceed with changes.
Navneet is engaged and processing the request regarding flight changes.
INFO [controller] 🔍 Scrolled down the page by 483 pixels
INFO [src.agent.custom_agent]
📍 Step 16
INFO [src.agent.custom_agent] ❌ Eval: Failed - The chat window is accessed but interactions were unsuccessful before.
INFO [src.agent.custom_agent] 🧠 New Memory:
INFO [src.agent.custom_agent] ⏳ Task Progress:
1. Open Expedia website. 2. Navigate to Support section.
INFO [src.agent.custom_agent] 📋 Future Plans:
1. Attempt to interact with the chat input field to confirm the return date change. 2. Send the necessary details to Navneet.
INFO [src.agent.custom_agent] 🤔 Thought: The previous attempt to input text failed. I need to re-attempt to ensure the input field is correctly targeted.
INFO [src.agent.custom_agent] 🎯 Summary: Enter the chat input to confirm the new return date with Navneet.
INFO [src.agent.custom_agent] 🛠️ Action 1/2: {"input_text":{"index":33,"text":"Please change my return date to Feb 28. Thank you!"}}
INFO [src.agent.custom_agent] 🛠️ Action 2/2: {"send_keys":{"keys":"Enter"}}
FWIW I have had this same issue - with chat boxes specifically even - on Selenium for years. It got to the point where having pyautogui or AHK step in, perform that action, and then selenium continue was easier than getting it to actually type into a chat box (on some websites).
Let me know if something like this works for your use case.
from browser_use import Controller, ActionResult
import pyautogui
import time
from pydantic import BaseModel
from typing import Optional
controller = Controller()
class ChatInputParams(BaseModel):
x: int
y: int
input: str
ctrl_enter: Optional[bool] = False # If True, use Ctrl+Enter instead of Enter
@controller.action(
'Type input at (x, y) in chatbox and send',
param_model=ChatInputParams
)
def type_and_send_chat(params: ChatInputParams):
"""
Uses pyautogui to click at the provided (x, y) location, type the given input,
then sends it by pressing Enter or Ctrl+Enter.
"""
# Move and click at the chatbox location
pyautogui.click(params.x, params.y)
time.sleep(0.25) # Slight pause for UI focus
# Type the input text
pyautogui.typewrite(params.input, interval=0.03)
# Send the message
if params.ctrl_enter:
pyautogui.keyDown('ctrl')
pyautogui.press('enter')
pyautogui.keyUp('ctrl')
else:
pyautogui.press('enter')
# Let the agent know the message was sent
return ActionResult(
extracted_content=f"Successfully entered and sent '{params.input}' into the chatbox at ({params.x}, {params.y})"
)