crawl4ai version

0.5.0.post4

Expected Behavior

Crawler should crawl

Current Behavior

I get the following error

[ERROR]... × https://out-door.co.il/product/%d7%a4%d7%90%d7%a0%... | Error: ┌───────────────────────────────────────────────────────────────────────────────────────────────────────────────────────┐ │ × Unexpected error in _crawl_web at line 528 in wrap_api_call (venv/lib/python3.12/site- │ │ packages/playwright/_impl/_connection.py): │ │ Error: Page.content: Target page, context or browser has been closed │ │ │ │ Code context: │ │ 523 parsed_st = _extract_stack_trace_information_from_stack(st, is_internal) │ │ 524 self._api_zone.set(parsed_st) │ │ 525 try: │ │ 526 return await cb() │ │ 527 except Exception as error: │ │ 528 → raise rewrite_error(error, f"{parsed_st['apiName']}: {error}") from None │ │ 529 finally: │ │ 530 self._api_zone.set(None) │ │ 531 │ │ 532 def wrap_api_call_sync( │ │ 533 self, cb: Callable[[], Any], is_internal: bool = False │ └───────────────────────────────────────────────────────────────────────────────────────────────────────────────────────┘

this happens after about 50 to 100 pages

I use ec2 t2.large and this is my code

@app.post("/crawl", response_model=CrawlResponse) async def crawl(request: CrawlRequest): """ Run the crawler on the specified URL """ print(request)

try:
    # Convert UUID to string for the query
    crawler_config = execute_select_query(f"SELECT * FROM crawls WHERE id = '{request.crawler_id}'")
    if not crawler_config:
        raise HTTPException(
            status_code=404,
            detail=f"Crawler config not found for id: {request.crawler_id}"
        )
    
    crawler_config = crawler_config[0]
    root_url = crawler_config['root_url']
    logger.info(f"🔍 Starting crawl for URL: {root_url}")
    
    depth = crawler_config.get('depth', 1)
    include_external = crawler_config.get('include_external', False)
    max_pages = crawler_config.get('max_pages', 5)
    
    # Step 1: Create a pruning filter
    prune_filter = PruningContentFilter(
        # Lower → more content retained, higher → more content pruned
        threshold=0.45,           
        # "fixed" or "dynamic"
        threshold_type="dynamic",  
        # Ignore nodes with <5 words
        min_word_threshold=5      
    )

    # Step 2: Insert it into a Markdown Generator
    md_generator = DefaultMarkdownGenerator(content_filter=prune_filter) #, options={"ignore_links": True}

    # Step 3: Pass it to CrawlerRunConfig
    # Configure the crawler
    config = CrawlerRunConfig(
        deep_crawl_strategy=BFSDeepCrawlStrategy(
            max_depth=depth,
            include_external=include_external,
            max_pages=max_pages
        ),
        scraping_strategy=LXMLWebScrapingStrategy(),
        stream=True,
        verbose=True,
        markdown_generator=md_generator
    )

    crawled_pages = []
    page_count = 0

    # Run the crawler
    async with AsyncWebCrawler() as crawler:
        try:
            async for result in await crawler.arun(crawler_config['root_url'], config=config):
                processed_result = await process_crawl_result(crawler_config, result)
                crawled_pages.append(processed_result)
                page_count += 1
                logger.info(f"Processed page {page_count}: {result.url}")
        except Exception as crawl_error:
            logger.error(f"Error during crawling: {str(crawl_error)}")
            raise HTTPException(
                status_code=500,
                detail=f"Crawling process failed: {str(crawl_error)}"
            )

    result = {
        "url": root_url,
        "depth": depth,
        "pages_crawled": page_count,
        "crawled_pages": crawled_pages
    }
    
    return CrawlResponse(
        status="success",
        data=result
    )

except Exception as e:
    logger.error(f"Crawling error: {str(e)}")
    raise HTTPException(
        status_code=500,
        detail=f"Crawling failed: {str(e)}"
    )

any idea on how to debug it? what does this error means?

My guess is that the headless browser is crashing, but I'm not sure how to debug it, and why it could happen

When I run a crawler with simpe fetch I can crawl all 483 pages in the web site, but with crawl4ai it crashes after about a 50 to 100 pages, and just print a list of these errors

Is this reproducible?

Yes

Inputs Causing the Bug

Steps to Reproduce

Code snippets

OS

ubuntu (ec2 t2.large)

Python version

3.12.3

Browser

No response

Browser version

No response

Error logs & Screenshots (if applicable)

No response

Mar 16 '25 19:03 eliaweiss

here is some more info:

max_page is ignored max_pages = 10

Configure a 2-level deep crawl

config = CrawlerRunConfig( semaphore_count=1, deep_crawl_strategy=BFSDeepCrawlStrategy( max_depth=10, include_external=False, # Maximum number of pages to crawl (optional) max_pages=max_pages ), scraping_strategy=LXMLWebScrapingStrategy(), stream=True, # Enable streaming verbose=True )
adding a break page_count = 0 async with AsyncWebCrawler() as crawler: async for result in await crawler.arun("https://out-door.co.il/", config=config): page_count += 1 print(f"page_count {page_count}") if page_count > 10: break await process_result(result)

cause this error [ERROR]... × https://out-door.co.il/product-category/%d7%9e%d7%... | Error: ┌───────────────────────────────────────────────────────────────────────────────────────────────────────────────────────┐ │ × Unexpected error in _crawl_web at line 579 in _crawl_web (venv/lib/python3.10/site- │ │ packages/crawl4ai/async_crawler_strategy.py): │ │ Error: Failed on navigating ACS-GOTO: │ │ Page.goto: net::ERR_ABORTED; maybe frame was detached? │ │ Call log: │ │ - navigating to "https://out-door.co.il/product-category/%d7%9e%d7%a2%d7%a7%d7%94-%d7%a7%d7%a6%d7%94- │ │ %d7%9c%d7%9e%d7%9b%d7%99%d7%a8%d7%94/%d7%a1%d7%95%d7%92%d7%99-%d7%9e%d7%a2%d7%a7%d7%95%d7%aa", waiting until │ │ "domcontentloaded" │ │ │ │ │ │ Code context: │ │ 574 response = await page.goto( │ │ 575 url, wait_until=config.wait_until, timeout=config.page_timeout │ │ 576 ) │ │ 577 redirected_url = page.url │ │ 578 except Error as e: │ │ 579 → raise RuntimeError(f"Failed on navigating ACS-GOTO:\n{str(e)}") │ │ 580 │ │ 581 await self.execute_hook( │ │ 582 "after_goto", page, context=context, url=url, response=response, config=config │ │ 583 ) │ │ 584 │ └───────────────────────────────────────────────────────────────────────────────────────────────────────────────────────┘

Mar 16 '25 21:03 eliaweiss

although it showing 500 crawled page, it only save 250, does it know how to handle repeated links?

Mar 16 '25 21:03 eliaweiss

it seems that I was able to suppress this issue by setting semaphore_count=1,

Mar 16 '25 23:03 eliaweiss

same problem

Mar 17 '25 02:03 inVains

I'm pretty sure the problem is in playwright/chromium rather than crawl4ai

And that it is a resource problem

Note that a similar problem is reported on playwright proj

Mar 17 '25 09:03 eliaweiss

I'm pretty sure the problem is in playwright/chromium rather than crawl4ai

And that it is a resource problem

Note that a similar problem is reported on playwright proj

@eliaweiss Do you have the issue Id for problem reported on playwright proj. Can you link that here.

Mar 17 '25 12:03 aravindkarnam

@aravindkarnam See this issue https://github.com/microsoft/playwright/issues/13038

The error msg is different, but in my log there were a ton of error msg, and later I realize that the first one was with this msg browser.newContext: Target page, context or browser has been closed)

which is also reported in the playwright/issues/13038

Mar 17 '25 16:03 eliaweiss

on my side I fixed it by switching from chromium to firefox https://docs.crawl4ai.com/api/parameters/

Mar 22 '25 11:03 no-chris

same problem. consistently happens on the second crawl attempt. Any updates here?

Mar 26 '25 11:03 aysan0

Same problem here. I changed my browser to firefox, and the bug was not fixed.

Mar 27 '25 12:03 Sandy-Tsang

RCA

When making consecutive requests to the /crawl endpoint, the second request would fail with:

"BrowserType.launch: Target page, context or browser has been closed"

The BrowserManager class in Crawl4AI implemented a singleton pattern for the Playwright instance using a static class variable:

_playwright_instance = None
    
@classmethod
async def get_playwright(cls):
    if cls._playwright_instance is None:
        cls._playwright_instance = await async_playwright().start()
    return cls._playwright_instance

When the browser was closed after the first request, the close() method properly stopped the Playwright instance, but did not reset the static _playwright_instance reference:

async def close(self):
    # ...
    if self.playwright:
        await self.playwright.stop()
        self.playwright = None
    # Missing: BrowserManager._playwright_instance = None

This caused subsequent requests to try using an already-closed Playwright instance.

Why This Only Appeared in Server Environment?

This issue specifically manifested in the server environment because:

In server contexts, the process remains alive between requests Static/class variables persist across multiple requests In library usage, the process would typically terminate after use, naturally cleaning up all resources

Solution

We modified the close() method in the AsyncPlaywrightCrawlerStrategy class to reset the Playwright instance after cleanup:

async def close(self):
    """
    Close the browser and clean up resources.
    """
    await self.browser_manager.close()
    
    # Reset the static Playwright instance
    BrowserManager._playwright_instance = None

This ensures that each new request gets a fresh Playwright instance, preventing the error while maintaining the resource efficiency benefits of the singleton pattern within a single request's lifecycle.

Mar 28 '25 13:03 aravindkarnam

@aravindkarnam awesome! Appreciate the quick turn around! Is there a PR ?

Mar 28 '25 13:03 aysan0

@aysan0 Yeah. This was quite a mole hunt! I need some help with testing this out first. I pushed this to the bug fix branch. Could you pull this, run it once and give me confirmation that this indeed fixes the issue.

Mar 28 '25 14:03 aravindkarnam

@aysan0 Yeah. This was quite a mole hunt! I need some help with testing this out first. I pushed this to the bug fix branch. Could you pull this, run it once and give me confirmation that this indeed fixes the issue.

It works. Thank you so much for fixing the bug!

Mar 31 '25 06:03 Rolnand

is that fixed now ?

Apr 02 '25 13:04 Sanjaypranav

I don't think it is fixed yet, meanwhile you can monkey patch it in your code. When the fix is released you can upgrade the package and omit the patch.

from crawl4ai.async_crawler_strategy import AsyncPlaywrightCrawlerStrategy
from crawl4ai.browser_manager import BrowserManager


async def patched_async_playwright__crawler_strategy_close(self) -> None:
    """
    Close the browser and clean up resources.

    This patch addresses an issue with Playwright instance cleanup where the static instance
    wasn't being properly reset, leading to issues with multiple crawls.

    Issue: https://github.com/unclecode/crawl4ai/issues/842

    Returns:
        None
    """
    await self.browser_manager.close()

    # Reset the static Playwright instance
    BrowserManager._playwright_instance = None


AsyncPlaywrightCrawlerStrategy.close = patched_async_playwright__crawler_strategy_close

Apr 04 '25 13:04 StefanSamba

What is the latest patch version

Apr 07 '25 03:04 wenxiHome

As @aravindkarnam mentioned, there's a fix on this branch.

It's not merged into the next or main branch yet. Today, I'm taking some time to test it but so far the playwrite issue did not reappear.

Steps to build and test:

Fork repo or cone code
cd into repo
git checkout 2025-MAR-ALPHA-1
docker build -t crawl4ai:playwrite-fix .
docker run -p <PORT_ON_YOUR_MACHINE>:8000 -e CRAWL4AI_API_TOKEN=<TOKEN> crawl4ai:playwrite-fix
Make request

Awesome work! Would be great to see it merged!

Apr 07 '25 10:04 florianmartens

`# browser_patch.py """ Monkey patch for fixing the browser closure issue in crawl4ai

"""

import logging from crawl4ai.async_crawler_strategy import AsyncPlaywrightCrawlerStrategy from crawl4ai.browser_manager import BrowserManager

Store the original close method

original_close = AsyncPlaywrightCrawlerStrategy.close

async def patched_close(self): """ Patched close method that resets the Playwright instance after cleanup.

This fixes the issue where subsequent crawl requests fail with:
"BrowserType.launch: Target page, context or browser has been closed"
"""

# Call the original close method
await original_close(self)

# Reset the static Playwright instance
BrowserManager._playwright_instance = None

return

Apply the monkey patch

AsyncPlaywrightCrawlerStrategy.close = patched_close`

Apr 08 '25 07:04 syedazharmbnr1

I don't think it is fixed yet, meanwhile you can monkey patch it in your code. When the fix is released you can upgrade the package and omit the patch.

from crawl4ai.async_crawler_strategy import AsyncPlaywrightCrawlerStrategy from crawl4ai.browser_manager import BrowserManager

async def patched_async_playwright__crawler_strategy_close(self) -> None: """ Close the browser and clean up resources.
This patch addresses an issue with Playwright instance cleanup where the static instance
wasn't being properly reset, leading to issues with multiple crawls.

Issue: https://github.com/unclecode/crawl4ai/issues/842

Returns:
    None
"""
await self.browser_manager.close()

# Reset the static Playwright instance
BrowserManager._playwright_instance = None
AsyncPlaywrightCrawlerStrategy.close = patched_async_playwright__crawler_strategy

worked! thank you!

Apr 08 '25 22:04 Jessi11111

the monkey patch still seems problematic for me when working with lots of concurrent task, I still have problems with this (just less often but I get an error that I am not founding the browser anymore for some calls)

Apr 09 '25 14:04 LLazzar

I have the code checked out that is supposed to have the fix. It definitely fixed the issue when making sequential requests, but I still seem to be getting this error when making concurrent requests. I am hosting the docker image with the code from this branch (which has the fix mentioned here) but still getting the issue. I'm hosting it on AWS ECS with 4 vCPU and 8GB of memory. When making about 100 concurrent requests about 25% of them give back this error.

{
    'success': True,
    'results': [
        {
            'url': 'https://www.informationweek.com/it-leadership',
            'html': '',
            'success': False,
            'cleaned_html': None,
            'media': {},
            'links': {},
            'downloaded_files': None,
            'js_execution_result': None,
            'screenshot': None,
            'pdf': None,
            'extracted_content': None,
            'metadata': None,
            'error_message': 'Unexpected error in _crawl_web at line 582 in _crawl_web 
(../usr/local/lib/python3.10/site-packages/crawl4ai/async_crawler_strategy.py):\nError: Failed on navigating 
ACS-GOTO:\nPage.goto: Target page, context or browser has been closed\n\nCode context:\n 577                       
response = await page.goto(\n 578                           url, wait_until=config.wait_until, 
timeout=config.page_timeout\n 579                       )\n 580                       redirected_url = page.url\n 
581                   except Error as e:\n 582 →                     raise RuntimeError(f"Failed on navigating 
ACS-GOTO:\\n{str(e)}")\n 583   \n 584                   await self.execute_hook(\n 585                       
"after_goto", page, context=context, url=url, response=response, config=config\n 586                   )\n 587   ',
            'session_id': None,
            'response_headers': None,
            'status_code': None,
            'ssl_certificate': None,
            'dispatch_result': None,
            'redirected_url': None
        }
    ]
}

Here is the full error message for one of the requests

Apr 10 '25 18:04 viraj-lunani

@viraj-lunani I don't think this is the same error. Can you share with me your input payload, so I can verify.

Apr 11 '25 05:04 aravindkarnam

Hey thanks for the response! the payload looks like this

def get_markdown_crawl4ai(url, base_url='http://localhost:8777/crawl', attempts=0):
    crawl_payload = {
        "urls": [url],
        "browser_config": {"headless": True},
        "crawler_config": {"stream": False}
    }
    response = requests.post(
        base_url,
        json=crawl_payload
    )
    ...

Apr 11 '25 19:04 viraj-lunani

There where 2 monkey patch suggested on this thread, the second one got garbled, here it is again:


# browser_patch.py
"""
Monkey patch for fixing the browser closure issue in crawl4ai

"""

import logging
from crawl4ai.async_crawler_strategy import AsyncPlaywrightCrawlerStrategy
from crawl4ai.browser_manager import BrowserManager

logger = logging.getLogger(__name__)
# Store the original close method
original_close = AsyncPlaywrightCrawlerStrategy.close


async def patched_close(self):
    """
    Patched close method that resets the Playwright instance after cleanup.

    This fixes the issue where subsequent crawl requests fail with:
    "BrowserType.launch: Target page, context or browser has been closed"
    """

    # Call the original close method
    await original_close(self)

    # Reset the static Playwright instance
    BrowserManager._playwright_instance = None

    return

# Apply the monkey patch
AsyncPlaywrightCrawlerStrategy.close = patched_close
logger.info("Monkey patch applied for AsyncPlaywrightCrawlerStrategy.close")

Why is this better?

The first version breaks if original_close changes. The second one is more robust—it doesn’t assume anything about original_close and just uses it directly.

How do I apply the monkey patch?

# Place this in your main file
# Apply monkey patch for AsyncPlaywrightCrawlerStrategy
from app.crawl.patch.patched_async_playwright__crawler_strategy_close import *

Make sure you see "Monkey patch applied for AsyncPlaywrightCrawlerStrategy.close" in your logs

Apr 20 '25 11:04 eliaweiss

@aravindkarnam Pretty sure @viraj-lunani is hitting the same issue — it’s that browser has been closed error again.

Makes sense that with concurrent crawls on a singleton instance, you could hit a race condition: one thread closes the browser while another tries to use it before it’s nullified.

Sequential runs are fine with the current fix since the code checks for null and reinitializes the browser.

But yeah, this setup isn’t thread-safe. A few thoughts for you to consider:

1.	*Should we even close the Playwright browser?*

Ideally yes, but since it’s a singleton and the main operation is crawling, maybe we just keep it open?

2.	*Make browser close + null assignment atomic with a semaphore*

This should ensure thread-safety. Just be cautious — semaphores can cause deadlocks. To avoid that, maybe use a single global semaphore.

3.	*Catch the error and recreate the browser*

Easiest fix. If the browser is closed, recreate it just like when it’s null.

4.	*Add reference counting to the singleton*

More complex, but if done right, gives you solid control.

Note: I would consider implementing 2+3+4 for a fully correct robust solution

Apr 20 '25 12:04 eliaweiss

@aravindkarnam @viraj-lunani

Here’s a semaphore-based solution—should be thread-safe, but without more context, it’s hard to judge the risks. I had to use the semaphore in both the close and start functions to make it work properly.

# browser_patch.py
"""
Monkey patch for fixing the browser closure issue in crawl4ai

"""

import logging
import asyncio
from crawl4ai.async_crawler_strategy import AsyncPlaywrightCrawlerStrategy
from crawl4ai.browser_manager import BrowserManager

logger = logging.getLogger(__name__)
# Store the original close method

# Create a semaphore for atomic operations
_close_semaphore = asyncio.Semaphore(1)

original_close = AsyncPlaywrightCrawlerStrategy.close


async def patched_close(self):
    """
    Patched close method that resets the Playwright instance after cleanup.

    This fixes the issue where subsequent crawl requests fail with:
    "BrowserType.launch: Target page, context or browser has been closed"
    """

    async with _close_semaphore:
        # Call the original close method
        await original_close(self)

        # Reset the static Playwright instance
        BrowserManager._playwright_instance = None

    return

AsyncPlaywrightCrawlerStrategy.close = patched_close


original_start = AsyncPlaywrightCrawlerStrategy.start


async def patched_start(self):
    """
    Patched start method that resets the Playwright instance after cleanup.
    """
    async with _close_semaphore:
        await original_start(self)
    return
AsyncPlaywrightCrawlerStrategy.start = patched_start

# Apply the monkey patch
logger.info("Monkey patch applied for AsyncPlaywrightCrawlerStrategy.close")

Apr 20 '25 12:04 eliaweiss

Final remark from my side - I disable the close function completely as I don't see a reason to close the browser being the sole main purpose of my crawler code

I believe that in most use cases this is the best solution

import logging
from crawl4ai.async_crawler_strategy import AsyncPlaywrightCrawlerStrategy

logger = logging.getLogger(__name__)

async def patched_close(self):
    return

# Apply the monkey patch
AsyncPlaywrightCrawlerStrategy.close = patched_close
logger.info("Monkey patch applied for AsyncPlaywrightCrawlerStrategy.close")

Apr 20 '25 12:04 eliaweiss

@aravindkarnam has ever been merged into main? Any plans or blockers?

Apr 20 '25 15:04 florianmartens

I have v0.6.2, and sometime many pages returns Page.goto: Target page, context or browser has been closed during arun_many. Is this a different problem?

Apr 30 '25 14:04 itsklimov

[Bug]: Error: Page.content: Target page, context or browser has been closed

crawl4ai version

Expected Behavior

Current Behavior

Is this reproducible?

Inputs Causing the Bug

Steps to Reproduce

Code snippets

OS

Python version

Browser

Browser version

Error logs & Screenshots (if applicable)

Configure a 2-level deep crawl

RCA

Why This Only Appeared in Server Environment?

Solution

Store the original close method

Apply the monkey patch

Why is this better?

How do I apply the monkey patch?

I believe that in most use cases this is the best solution