crawl4ai icon indicating copy to clipboard operation
crawl4ai copied to clipboard

[Bug]: Browser path detection failing in Windmill.dev with crawl4ai

Open renatocaliari opened this issue 1 year ago • 7 comments

crawl4ai version

0.4.247

Expected Behavior

I'm trying to use crawl4ai with Windmill (https://www.windmill.dev/) for browser automation. However, I'm having trouble setting a executable path for the browser.

Issue:

The Windmill documentation (https://www.windmill.dev/docs/advanced/browser_automation#examples) provides an example for launching a browser instance:

const browser = await chromium.launch({
    executablePath: "/usr/bin/chromium",
    args: ['--no-sandbox', '--single-process', '--no-zygote', '--disable-setuid-sandbox', '--disable-dev-shm-usage', '--disable-gpu'],
});

When running crawl4ai without configuring the specific path, I receive the following error:

Error: BrowserType.launch: Executable doesn't exist at /tmp/.cache/ms-playwright/chromium-1148/chrome-linux/chrome
╔════════════════════════════════════════════════════════════╗
║ Looks like Playwright was just installed or updated.       ║
║ Please run the following command to download new browsers: ║
║                                                            ║
║     playwright install                                     ║
║                                                            ║
║ <3 Playwright Team                                         ║
╚════════════════════════════════════════════════════════════╝

Or the error:

INFO     Error Failed to start browser: [Errno 2] No such file or directory: 'google-chrome'

I suspect that the line browser_path = self._get_browser_path() in async_crawler_strategy.py is unable to automatically detect the browser's location in the Windmill environment.

Question:

How can I properly configure something like executablePath for the browser (e.g., Chromium or Google Chrome) when using crawl4ai within Windmill? Is there a way to manually specify the path, perhaps through an environment variable or a configuration setting within crawl4ai?

Current Behavior

Error:

Error: BrowserType.launch: Executable doesn't exist at /tmp/.cache/ms-playwright/chromium-1148/chrome-linux/chrome
╔════════════════════════════════════════════════════════════╗
║ Looks like Playwright was just installed or updated.       ║
║ Please run the following command to download new browsers: ║
║                                                            ║
║     playwright install                                     ║
║                                                            ║
║ <3 Playwright Team                                         ║
╚════════════════════════════════════════════════════════════╝

Or that error:

INFO     Error Failed to start browser: [Errno 2] No such file or directory: 'google-chrome'

Is this reproducible?

Yes

Inputs Causing the Bug


Steps to Reproduce


Code snippets

# requirements:
# crawl4ai

import asyncio
from crawl4ai import AsyncWebCrawler, BrowserConfig, CrawlerRunConfig
from crawl4ai.markdown_generation_strategy import DefaultMarkdownGenerator
# import os

# os.system("playwright install")
# os.system("playwright install-deps")
# os.system("crawl4ai-setup")

async def scrape(url: str):
    try:
        crawler = AsyncWebCrawler(config=BrowserConfig())
        await crawler.start()
        browser_config = BrowserConfig(
            headless=True,
            extra_args=[
                "--no-sandbox",
                "--single-process",
                "--no-zygote",
                "--disable-setuid-sandbox",
                "--disable-dev-shm-usage",
                "--disable-gpu",
            ],
            verbose=True,
        )
        crawl_config = CrawlerRunConfig(
            markdown_generator=DefaultMarkdownGenerator(),
            exclude_external_links=True,
            remove_overlay_elements=True,
            process_iframes=False,
        )

        result = await crawler.arun(
            url=url, config=crawl_config
        )  # Use await here as arun is likely async
        return result
    finally:
        if "crawler" in locals() and crawler:
            await crawler.close()


def main(url: str):
    result = asyncio.run(scrape(url))
    return result

OS

windmill.dev (cloud) - Linux?

Python version

3.11

Browser

No response

Browser version

No response

Error logs & Screenshots (if applicable)

No response

renatocaliari avatar Jan 20 '25 20:01 renatocaliari

Just checking if anyone had a chance to look into that issue. Any guidance would be much appreciated! 🙏

renatocaliari avatar Jan 27 '25 11:01 renatocaliari

@renatocaliari Thx for trying the library. Are you able to create a code snippet example, where you simply crawl a page like https://crawl4ai.com, and using this browser? Then share it her for us to check. Thx

unclecode avatar Jan 28 '25 15:01 unclecode

@renatocaliari Thx for trying the library. Are you able to create a code snippet example, where you simply crawl a page like https://crawl4ai.com, and using this browser? Then share it her for us to check. Thx

I've updated the issue with the code snippet and details of another related error.

renatocaliari avatar Jan 28 '25 17:01 renatocaliari

Having the same issue.

huotarih avatar Feb 11 '25 09:02 huotarih

Having the same issue.

Is there a way to manually specify the path, perhaps through an environment variable or a configuration setting within crawl4ai?

wss-git avatar Mar 25 '25 07:03 wss-git

having the same issue in aws lambda

RoniFinTech3 avatar Jun 09 '25 15:06 RoniFinTech3

I've done some research on Windmill, and noticed that since Playwright needs to download browser binaries (Chromium, Firefox, WebKit). In Windmill's containerized environment, you'd need to ensure:

  • The browsers are pre-installed in the worker environment
  • Or the Playwright installation process can download them at runtime

Or, use a custom Docker image for Windmill workers that includes Playwright and browsers pre-installed. And if you're self-hosting Windmill, you have more control over the worker environment.

ntohidi avatar Oct 14 '25 09:10 ntohidi