[Bug]: Error: Page.content: Target page, context or browser has been closed
crawl4ai version
0.5.0.post4
Expected Behavior
Crawler should crawl
Current Behavior
I get the following error
[ERROR]... Γ https://out-door.co.il/product/%d7%a4%d7%90%d7%a0%... | Error: βββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββ β Γ Unexpected error in _crawl_web at line 528 in wrap_api_call (venv/lib/python3.12/site- β β packages/playwright/_impl/_connection.py): β β Error: Page.content: Target page, context or browser has been closed β β β β Code context: β β 523 parsed_st = _extract_stack_trace_information_from_stack(st, is_internal) β β 524 self._api_zone.set(parsed_st) β β 525 try: β β 526 return await cb() β β 527 except Exception as error: β β 528 β raise rewrite_error(error, f"{parsed_st['apiName']}: {error}") from None β β 529 finally: β β 530 self._api_zone.set(None) β β 531 β β 532 def wrap_api_call_sync( β β 533 self, cb: Callable[[], Any], is_internal: bool = False β βββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββ
this happens after about 50 to 100 pages
I use ec2 t2.large and this is my code
@app.post("/crawl", response_model=CrawlResponse) async def crawl(request: CrawlRequest): """ Run the crawler on the specified URL """ print(request)
try:
# Convert UUID to string for the query
crawler_config = execute_select_query(f"SELECT * FROM crawls WHERE id = '{request.crawler_id}'")
if not crawler_config:
raise HTTPException(
status_code=404,
detail=f"Crawler config not found for id: {request.crawler_id}"
)
crawler_config = crawler_config[0]
root_url = crawler_config['root_url']
logger.info(f"π Starting crawl for URL: {root_url}")
depth = crawler_config.get('depth', 1)
include_external = crawler_config.get('include_external', False)
max_pages = crawler_config.get('max_pages', 5)
# Step 1: Create a pruning filter
prune_filter = PruningContentFilter(
# Lower β more content retained, higher β more content pruned
threshold=0.45,
# "fixed" or "dynamic"
threshold_type="dynamic",
# Ignore nodes with <5 words
min_word_threshold=5
)
# Step 2: Insert it into a Markdown Generator
md_generator = DefaultMarkdownGenerator(content_filter=prune_filter) #, options={"ignore_links": True}
# Step 3: Pass it to CrawlerRunConfig
# Configure the crawler
config = CrawlerRunConfig(
deep_crawl_strategy=BFSDeepCrawlStrategy(
max_depth=depth,
include_external=include_external,
max_pages=max_pages
),
scraping_strategy=LXMLWebScrapingStrategy(),
stream=True,
verbose=True,
markdown_generator=md_generator
)
crawled_pages = []
page_count = 0
# Run the crawler
async with AsyncWebCrawler() as crawler:
try:
async for result in await crawler.arun(crawler_config['root_url'], config=config):
processed_result = await process_crawl_result(crawler_config, result)
crawled_pages.append(processed_result)
page_count += 1
logger.info(f"Processed page {page_count}: {result.url}")
except Exception as crawl_error:
logger.error(f"Error during crawling: {str(crawl_error)}")
raise HTTPException(
status_code=500,
detail=f"Crawling process failed: {str(crawl_error)}"
)
result = {
"url": root_url,
"depth": depth,
"pages_crawled": page_count,
"crawled_pages": crawled_pages
}
return CrawlResponse(
status="success",
data=result
)
except Exception as e:
logger.error(f"Crawling error: {str(e)}")
raise HTTPException(
status_code=500,
detail=f"Crawling failed: {str(e)}"
)
any idea on how to debug it? what does this error means?
My guess is that the headless browser is crashing, but I'm not sure how to debug it, and why it could happen
When I run a crawler with simpe fetch I can crawl all 483 pages in the web site, but with crawl4ai it crashes after about a 50 to 100 pages, and just print a list of these errors
Is this reproducible?
Yes
Inputs Causing the Bug
Steps to Reproduce
Code snippets
OS
ubuntu (ec2 t2.large)
Python version
3.12.3
Browser
No response
Browser version
No response
Error logs & Screenshots (if applicable)
No response
here is some more info:
-
max_page is ignored max_pages = 10
Configure a 2-level deep crawl
config = CrawlerRunConfig( semaphore_count=1, deep_crawl_strategy=BFSDeepCrawlStrategy( max_depth=10, include_external=False, # Maximum number of pages to crawl (optional) max_pages=max_pages ), scraping_strategy=LXMLWebScrapingStrategy(), stream=True, # Enable streaming verbose=True )
-
adding a break page_count = 0 async with AsyncWebCrawler() as crawler: async for result in await crawler.arun("https://out-door.co.il/", config=config): page_count += 1 print(f"page_count {page_count}") if page_count > 10: break await process_result(result)
cause this error [ERROR]... Γ https://out-door.co.il/product-category/%d7%9e%d7%... | Error: βββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββ β Γ Unexpected error in _crawl_web at line 579 in _crawl_web (venv/lib/python3.10/site- β β packages/crawl4ai/async_crawler_strategy.py): β β Error: Failed on navigating ACS-GOTO: β β Page.goto: net::ERR_ABORTED; maybe frame was detached? β β Call log: β β - navigating to "https://out-door.co.il/product-category/%d7%9e%d7%a2%d7%a7%d7%94-%d7%a7%d7%a6%d7%94- β β %d7%9c%d7%9e%d7%9b%d7%99%d7%a8%d7%94/%d7%a1%d7%95%d7%92%d7%99-%d7%9e%d7%a2%d7%a7%d7%95%d7%aa", waiting until β β "domcontentloaded" β β β β β β Code context: β β 574 response = await page.goto( β β 575 url, wait_until=config.wait_until, timeout=config.page_timeout β β 576 ) β β 577 redirected_url = page.url β β 578 except Error as e: β β 579 β raise RuntimeError(f"Failed on navigating ACS-GOTO:\n{str(e)}") β β 580 β β 581 await self.execute_hook( β β 582 "after_goto", page, context=context, url=url, response=response, config=config β β 583 ) β β 584 β βββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββ
although it showing 500 crawled page, it only save 250, does it know how to handle repeated links?
it seems that I was able to suppress this issue by setting semaphore_count=1,
same problem
I'm pretty sure the problem is in playwright/chromium rather than crawl4ai
And that it is a resource problem
Note that a similar problem is reported on playwright proj
I'm pretty sure the problem is in playwright/chromium rather than crawl4ai
And that it is a resource problem
Note that a similar problem is reported on playwright proj
@eliaweiss Do you have the issue Id for problem reported on playwright proj. Can you link that here.
@aravindkarnam See this issue https://github.com/microsoft/playwright/issues/13038
The error msg is different, but in my log there were a ton of error msg, and later I realize that the first one was with this msg browser.newContext: Target page, context or browser has been closed)
which is also reported in the playwright/issues/13038
on my side I fixed it by switching from chromium to firefox
https://docs.crawl4ai.com/api/parameters/
same problem. consistently happens on the second crawl attempt. Any updates here?
Same problem here. I changed my browser to firefox, and the bug was not fixed.
RCA
When making consecutive requests to the /crawl endpoint, the second request would fail with:
"BrowserType.launch: Target page, context or browser has been closed"
The BrowserManager class in Crawl4AI implemented a singleton pattern for the Playwright instance using a static class variable:
_playwright_instance = None
@classmethod
async def get_playwright(cls):
if cls._playwright_instance is None:
cls._playwright_instance = await async_playwright().start()
return cls._playwright_instance
When the browser was closed after the first request, the close() method properly stopped the Playwright instance, but did not reset the static _playwright_instance reference:
async def close(self):
# ...
if self.playwright:
await self.playwright.stop()
self.playwright = None
# Missing: BrowserManager._playwright_instance = None
This caused subsequent requests to try using an already-closed Playwright instance.
Why This Only Appeared in Server Environment?
This issue specifically manifested in the server environment because:
In server contexts, the process remains alive between requests Static/class variables persist across multiple requests In library usage, the process would typically terminate after use, naturally cleaning up all resources
Solution
We modified the close() method in the AsyncPlaywrightCrawlerStrategy class to reset the Playwright instance after cleanup:
async def close(self):
"""
Close the browser and clean up resources.
"""
await self.browser_manager.close()
# Reset the static Playwright instance
BrowserManager._playwright_instance = None
This ensures that each new request gets a fresh Playwright instance, preventing the error while maintaining the resource efficiency benefits of the singleton pattern within a single request's lifecycle.
@aravindkarnam awesome! Appreciate the quick turn around! Is there a PR ?
@aysan0 Yeah. This was quite a mole hunt! I need some help with testing this out first. I pushed this to the bug fix branch. Could you pull this, run it once and give me confirmation that this indeed fixes the issue.
@aysan0 Yeah. This was quite a mole hunt! I need some help with testing this out first. I pushed this to the bug fix branch. Could you pull this, run it once and give me confirmation that this indeed fixes the issue.
It works. Thank you so much for fixing the bug!
is that fixed now ?
I don't think it is fixed yet, meanwhile you can monkey patch it in your code. When the fix is released you can upgrade the package and omit the patch.
from crawl4ai.async_crawler_strategy import AsyncPlaywrightCrawlerStrategy
from crawl4ai.browser_manager import BrowserManager
async def patched_async_playwright__crawler_strategy_close(self) -> None:
"""
Close the browser and clean up resources.
This patch addresses an issue with Playwright instance cleanup where the static instance
wasn't being properly reset, leading to issues with multiple crawls.
Issue: https://github.com/unclecode/crawl4ai/issues/842
Returns:
None
"""
await self.browser_manager.close()
# Reset the static Playwright instance
BrowserManager._playwright_instance = None
AsyncPlaywrightCrawlerStrategy.close = patched_async_playwright__crawler_strategy_close
What is the latest patch version
As @aravindkarnam mentioned, there's a fix on this branch.
It's not merged into the next or main branch yet. Today, I'm taking some time to test it but so far the playwrite issue did not reappear.
Steps to build and test:
- Fork repo or cone code
- cd into repo
-
git checkout 2025-MAR-ALPHA-1 -
docker build -t crawl4ai:playwrite-fix . -
docker run -p <PORT_ON_YOUR_MACHINE>:8000 -e CRAWL4AI_API_TOKEN=<TOKEN> crawl4ai:playwrite-fix - Make request
Awesome work! Would be great to see it merged!
`# browser_patch.py """ Monkey patch for fixing the browser closure issue in crawl4ai
"""
import logging from crawl4ai.async_crawler_strategy import AsyncPlaywrightCrawlerStrategy from crawl4ai.browser_manager import BrowserManager
Store the original close method
original_close = AsyncPlaywrightCrawlerStrategy.close
async def patched_close(self): """ Patched close method that resets the Playwright instance after cleanup.
This fixes the issue where subsequent crawl requests fail with:
"BrowserType.launch: Target page, context or browser has been closed"
"""
# Call the original close method
await original_close(self)
# Reset the static Playwright instance
BrowserManager._playwright_instance = None
return
Apply the monkey patch
AsyncPlaywrightCrawlerStrategy.close = patched_close`
I don't think it is fixed yet, meanwhile you can monkey patch it in your code. When the fix is released you can upgrade the package and omit the patch.
from crawl4ai.async_crawler_strategy import AsyncPlaywrightCrawlerStrategy from crawl4ai.browser_manager import BrowserManager
async def patched_async_playwright__crawler_strategy_close(self) -> None: """ Close the browser and clean up resources.
This patch addresses an issue with Playwright instance cleanup where the static instance wasn't being properly reset, leading to issues with multiple crawls. Issue: https://github.com/unclecode/crawl4ai/issues/842 Returns: None """ await self.browser_manager.close() # Reset the static Playwright instance BrowserManager._playwright_instance = NoneAsyncPlaywrightCrawlerStrategy.close = patched_async_playwright__crawler_strategy
worked! thank you!
the monkey patch still seems problematic for me when working with lots of concurrent task, I still have problems with this (just less often but I get an error that I am not founding the browser anymore for some calls)
I have the code checked out that is supposed to have the fix. It definitely fixed the issue when making sequential requests, but I still seem to be getting this error when making concurrent requests. I am hosting the docker image with the code from this branch (which has the fix mentioned here) but still getting the issue. I'm hosting it on AWS ECS with 4 vCPU and 8GB of memory. When making about 100 concurrent requests about 25% of them give back this error.
{
'success': True,
'results': [
{
'url': 'https://www.informationweek.com/it-leadership',
'html': '',
'success': False,
'cleaned_html': None,
'media': {},
'links': {},
'downloaded_files': None,
'js_execution_result': None,
'screenshot': None,
'pdf': None,
'extracted_content': None,
'metadata': None,
'error_message': 'Unexpected error in _crawl_web at line 582 in _crawl_web
(../usr/local/lib/python3.10/site-packages/crawl4ai/async_crawler_strategy.py):\nError: Failed on navigating
ACS-GOTO:\nPage.goto: Target page, context or browser has been closed\n\nCode context:\n 577
response = await page.goto(\n 578 url, wait_until=config.wait_until,
timeout=config.page_timeout\n 579 )\n 580 redirected_url = page.url\n
581 except Error as e:\n 582 β raise RuntimeError(f"Failed on navigating
ACS-GOTO:\\n{str(e)}")\n 583 \n 584 await self.execute_hook(\n 585
"after_goto", page, context=context, url=url, response=response, config=config\n 586 )\n 587 ',
'session_id': None,
'response_headers': None,
'status_code': None,
'ssl_certificate': None,
'dispatch_result': None,
'redirected_url': None
}
]
}
Here is the full error message for one of the requests
@viraj-lunani I don't think this is the same error. Can you share with me your input payload, so I can verify.
Hey thanks for the response! the payload looks like this
def get_markdown_crawl4ai(url, base_url='http://localhost:8777/crawl', attempts=0):
crawl_payload = {
"urls": [url],
"browser_config": {"headless": True},
"crawler_config": {"stream": False}
}
response = requests.post(
base_url,
json=crawl_payload
)
...
There where 2 monkey patch suggested on this thread, the second one got garbled, here it is again:
# browser_patch.py
"""
Monkey patch for fixing the browser closure issue in crawl4ai
"""
import logging
from crawl4ai.async_crawler_strategy import AsyncPlaywrightCrawlerStrategy
from crawl4ai.browser_manager import BrowserManager
logger = logging.getLogger(__name__)
# Store the original close method
original_close = AsyncPlaywrightCrawlerStrategy.close
async def patched_close(self):
"""
Patched close method that resets the Playwright instance after cleanup.
This fixes the issue where subsequent crawl requests fail with:
"BrowserType.launch: Target page, context or browser has been closed"
"""
# Call the original close method
await original_close(self)
# Reset the static Playwright instance
BrowserManager._playwright_instance = None
return
# Apply the monkey patch
AsyncPlaywrightCrawlerStrategy.close = patched_close
logger.info("Monkey patch applied for AsyncPlaywrightCrawlerStrategy.close")
Why is this better?
The first version breaks if original_close changes. The second one is more robustβit doesnβt assume anything about original_close and just uses it directly.
How do I apply the monkey patch?
# Place this in your main file
# Apply monkey patch for AsyncPlaywrightCrawlerStrategy
from app.crawl.patch.patched_async_playwright__crawler_strategy_close import *
Make sure you see "Monkey patch applied for AsyncPlaywrightCrawlerStrategy.close" in your logs
@aravindkarnam Pretty sure @viraj-lunani is hitting the same issue β itβs that browser has been closed error again.
Makes sense that with concurrent crawls on a singleton instance, you could hit a race condition: one thread closes the browser while another tries to use it before itβs nullified.
Sequential runs are fine with the current fix since the code checks for null and reinitializes the browser.
But yeah, this setup isnβt thread-safe. A few thoughts for you to consider:
1. *Should we even close the Playwright browser?*
Ideally yes, but since itβs a singleton and the main operation is crawling, maybe we just keep it open?
2. *Make browser close + null assignment atomic with a semaphore*
This should ensure thread-safety. Just be cautious β semaphores can cause deadlocks. To avoid that, maybe use a single global semaphore.
3. *Catch the error and recreate the browser*
Easiest fix. If the browser is closed, recreate it just like when itβs null.
4. *Add reference counting to the singleton*
More complex, but if done right, gives you solid control.
Note: I would consider implementing 2+3+4 for a fully correct robust solution
@aravindkarnam @viraj-lunani
Hereβs a semaphore-based solutionβshould be thread-safe, but without more context, itβs hard to judge the risks. I had to use the semaphore in both the close and start functions to make it work properly.
# browser_patch.py
"""
Monkey patch for fixing the browser closure issue in crawl4ai
"""
import logging
import asyncio
from crawl4ai.async_crawler_strategy import AsyncPlaywrightCrawlerStrategy
from crawl4ai.browser_manager import BrowserManager
logger = logging.getLogger(__name__)
# Store the original close method
# Create a semaphore for atomic operations
_close_semaphore = asyncio.Semaphore(1)
original_close = AsyncPlaywrightCrawlerStrategy.close
async def patched_close(self):
"""
Patched close method that resets the Playwright instance after cleanup.
This fixes the issue where subsequent crawl requests fail with:
"BrowserType.launch: Target page, context or browser has been closed"
"""
async with _close_semaphore:
# Call the original close method
await original_close(self)
# Reset the static Playwright instance
BrowserManager._playwright_instance = None
return
AsyncPlaywrightCrawlerStrategy.close = patched_close
original_start = AsyncPlaywrightCrawlerStrategy.start
async def patched_start(self):
"""
Patched start method that resets the Playwright instance after cleanup.
"""
async with _close_semaphore:
await original_start(self)
return
AsyncPlaywrightCrawlerStrategy.start = patched_start
# Apply the monkey patch
logger.info("Monkey patch applied for AsyncPlaywrightCrawlerStrategy.close")
Final remark from my side - I disable the close function completely as I don't see a reason to close the browser being the sole main purpose of my crawler code
I believe that in most use cases this is the best solution
import logging
from crawl4ai.async_crawler_strategy import AsyncPlaywrightCrawlerStrategy
logger = logging.getLogger(__name__)
async def patched_close(self):
return
# Apply the monkey patch
AsyncPlaywrightCrawlerStrategy.close = patched_close
logger.info("Monkey patch applied for AsyncPlaywrightCrawlerStrategy.close")
@aravindkarnam has ever been merged into main? Any plans or blockers?
I have v0.6.2, and sometime many pages returns Page.goto: Target page, context or browser has been closed during arun_many. Is this a different problem?