crawl4ai icon indicating copy to clipboard operation
crawl4ai copied to clipboard

[Bug]: Remote Browser Control with CDP

Open jlia0 opened this issue 11 months ago • 6 comments

crawl4ai version

0.4.248

Expected Behavior

In the readme it says crawl4ai supports: 🔄 Remote Browser Control: Connect to Chrome Developer Tools Protocol for remote, large-scale data extraction.

Image

However, none of the the documentation mentioned how to use crawl4ai with a remote browser via CDP.

jlia0 avatar Mar 03 '25 19:03 jlia0

@jlia0 You are right. There's no proper documentation for this. Good news is we are working on a command line utility that makes managing this entire remote browser control with Crawl4AI a breeze. I'll update the docs soon as that's ready and comment here too.

aravindkarnam avatar Mar 08 '25 15:03 aravindkarnam

Hi @aravindkarnam, is there any progress? Thanks!!

jlia0 avatar Mar 13 '25 21:03 jlia0

@jlia0 Still testing it! Will update here soon as it's ready for release.

aravindkarnam avatar Mar 17 '25 13:03 aravindkarnam

can I use the remote browser feature with steel.dev?

takan55 avatar Mar 29 '25 23:03 takan55

@jlia0 Still testing it! Will update here soon as it's ready for release.

Hey @aravindkarnam! Is this released yet?

jlia0 avatar Apr 17 '25 04:04 jlia0

@jlia0 I think you can pass cdp url in BrowserConfig.

  • Ref: https://github.com/unclecode/crawl4ai/blob/897e0173618d20fea5d8952ccdbcdad0febc0fee/crawl4ai/async_configs.py#L344

I haven't tried it yet, I'll try and reply here with a sample code if that works.

Edit: Here is a sample code that worked for me:

import asyncio
from crawl4ai import AsyncWebCrawler, BrowserConfig

# Replace with your actual CDP WebSocket URL (e.g. from browserless)
CDP_URL = "wss://your-remote-host.com/devtools/browser/<id>?token=your-token"

async def main():
    async with AsyncWebCrawler(
        config=BrowserConfig(
            cdp_url=CDP_URL,
            headless=True
        )
    ) as crawler:
        result = await crawler.arun(url="https://www.google.com")
        print(result)

if __name__ == "__main__":
    asyncio.run(main())

manashmandal avatar May 15 '25 23:05 manashmandal