[Bug]: error on url that redirects to a download
crawl4ai version
2025-feb-alpha-1
Expected Behavior
When provided an url that redirects to a download I would expect it to work (provided that you use accept_downloads + downloads_path in your browser config).
Current Behavior
It gives a net::ERR_ABORTED error on chromium.
According to https://github.com/microsoft/playwright/issues/28729#issuecomment-1863643942 this is expected behavior (although browser dependent)
Is this reproducible?
Yes
Inputs Causing the Bug
- add code to force pdf download (see code snippet below)
- crawl a url like: https://www.digitalmatter.com/overview
Steps to Reproduce
Code snippets
# I use the following code to force pdf files to download instead of showing them in the browser
async def handle_pdf(route: Route, request: Request):
response = await route.fetch()
headers = response.headers
if response.headers["content-type"] == "application/pdf":
headers["content-disposition"] = "attachment"
return await route.fulfill(headers=headers, response=response)
async def init_route(page: Page, context, url, config):
await page.unroute_all()
await page.route("**/*", handle_pdf)
crawler_strategy = AsyncPlaywrightCrawlerStrategy(browser_config=browser_config)
crawler_strategy.set_hook("before_goto", init_route)
AsyncWebCrawler(crawler_strategy=crawler_strategy)
OS
Linux
Python version
any
Browser
Chromium
Browser version
any
Error logs & Screenshots (if applicable)
No response
https://github.com/unclecode/crawl4ai/blob/99fa2d09082b0ff561a033702e3dd194cf93271e/crawl4ai/async_crawler_strategy.py#L1356
Changing this code to:
if 'net::ERR_ABORTED' in str(e):
response = None
else:
raise RuntimeError(f"Failed on navigating ACS-GOTO:\n{str(e)}")
Fixes the issue, I can provide a PR if needed (although this is not the prettiest fix)
@ederuiter thanks for root causing this! I'll make this change in the upcoming alpha release after v0.5
Iβve made the change, and itβll be included in the upcoming alpha release after v0.5. @aravindkarnam