crawl4ai icon indicating copy to clipboard operation
crawl4ai copied to clipboard

[Bug]: error on url that redirects to a download

Open ederuiter opened this issue 11 months ago β€’ 2 comments

crawl4ai version

2025-feb-alpha-1

Expected Behavior

When provided an url that redirects to a download I would expect it to work (provided that you use accept_downloads + downloads_path in your browser config).

Current Behavior

It gives a net::ERR_ABORTED error on chromium. According to https://github.com/microsoft/playwright/issues/28729#issuecomment-1863643942 this is expected behavior (although browser dependent)

Is this reproducible?

Yes

Inputs Causing the Bug

- add code to force pdf download (see code snippet below)
- crawl a url like: https://www.digitalmatter.com/overview

Steps to Reproduce


Code snippets

# I use the following code to force pdf files to download instead of showing them in the browser

       async def handle_pdf(route: Route, request: Request):
            response = await route.fetch()
            headers = response.headers
            if response.headers["content-type"] == "application/pdf":
                headers["content-disposition"] = "attachment"
            return await route.fulfill(headers=headers, response=response)

        async def init_route(page: Page, context, url, config):
            await page.unroute_all()
            await page.route("**/*", handle_pdf)

        crawler_strategy = AsyncPlaywrightCrawlerStrategy(browser_config=browser_config)
        crawler_strategy.set_hook("before_goto", init_route)

        AsyncWebCrawler(crawler_strategy=crawler_strategy)

OS

Linux

Python version

any

Browser

Chromium

Browser version

any

Error logs & Screenshots (if applicable)

No response

ederuiter avatar Feb 20 '25 14:02 ederuiter

https://github.com/unclecode/crawl4ai/blob/99fa2d09082b0ff561a033702e3dd194cf93271e/crawl4ai/async_crawler_strategy.py#L1356

Changing this code to:

                    if 'net::ERR_ABORTED' in str(e):
                        response = None
                    else:
                        raise RuntimeError(f"Failed on navigating ACS-GOTO:\n{str(e)}")

Fixes the issue, I can provide a PR if needed (although this is not the prettiest fix)

ederuiter avatar Feb 20 '25 14:02 ederuiter

@ederuiter thanks for root causing this! I'll make this change in the upcoming alpha release after v0.5

aravindkarnam avatar Mar 01 '25 13:03 aravindkarnam

I’ve made the change, and it’ll be included in the upcoming alpha release after v0.5. @aravindkarnam

ntohidi avatar Apr 17 '25 10:04 ntohidi