[Bug]: Failing to auth with proxy
crawl4ai version
0.6.2
Expected Behavior
The proxy documentation needs to be updated. if you try to provide a proxy config as a dict you get an exception because it now expects an actual proxy object.
when I replace this with the ProxyConfig class I expect it to work normally and not fail to auth
Current Behavior
trying to configure the browser like so
# Build the crawler config with more configuration for dynamic content
crawl_config = CrawlerRunConfig(
extraction_strategy=llm_strategy,
cache_mode=CacheMode.BYPASS,
# wait_until="networkidle",
wait_until="domcontentloaded",
page_timeout=60000,
fetch_ssl_certificate=False,
)
# Create browser config
browser_cfg = BrowserConfig(
headless=CONFIG.BROWSER_HEADLESS,
java_script_enabled=True, # Ensure JavaScript is enabled,
proxy_config=ProxyConfig(
server=f"https://{CONFIG.SMARTPROXY_API_KEY}:@api.zyte.com:8011",
)
)
results in a ERR_PROXY_CONNECTION_FAILED error
{'detail': '400: Unexpected error in _crawl_web at line 731 in _crawl_web (.venv\\Lib\\site-packages\\crawl4ai\\async_crawler_strategy.py):\nError: Failed on navigating ACS-GOTO:\nPage.goto: net::ERR_PROXY_CONNECTION_FAILED at https://wyomingcompany.com/aged-corporation/?state=New%20York\nCall log:\n - navigating to "https://wyomingcompany.com/aged-corporation/?state=New%20York", waiting until "domcontentloaded"\n\n\nCode context:\n 726 response = await page.goto(\n 727 url, wait_until=config.wait_until, timeout=config.page_timeout\n 728 )\n 729 redirected_url = page.url\n 730 except Error as [e:\n](file:///E:/n) 731 → raise RuntimeError(f"Failed on navigating ACS-GOTO:\\n{str(e)}")\n 732 \n 733 await self.execute_hook(\n 734 "after_goto", page, context=context, url=url, response=response, config=config\n 735 )\n 736 '}
Is this reproducible?
Yes
Inputs Causing the Bug
Steps to Reproduce
Code snippets
OS
windows 11
Python version
3.12.10
Browser
Chrome
Browser version
No response
Error logs & Screenshots (if applicable)
No response
I have the same problem, i get this error:
[ERROR]... × https://example.com | Error: ┌───────────────────────────────────────────────────────────────────────────────────────────────────────────────────────┐ │ × Unexpected error in _crawl_web at line 731 in _crawl_web (venv\Lib\site- │ │ packages\crawl4ai\async_crawler_strategy.py): │ │ Error: Failed on navigating ACS-GOTO: │ │ Page.goto: net::ERR_INVALID_AUTH_CREDENTIALS at https://example.com/ │ │ Call log: │ │ - navigating to "https://example.com/", waiting until "domcontentloaded" │ │ │ │ │ │ Code context: │ │ 726 response = await page.goto( │ │ 727 url, wait_until=config.wait_until, timeout=config.page_timeout │ │ 728 ) │ │ 729 redirected_url = page.url │ │ 730 except Error as e: │ │ 731 → raise RuntimeError(f"Failed on navigating ACS-GOTO:\n{str(e)}") │ │ 732 │ │ 733 await self.execute_hook( │ │ 734 "after_goto", page, context=context, url=url, response=response, config=config │ │ 735 ) │ │ 736 │ └───────────────────────────────────────────────────────────────────────────────────────────────────────────────────────┘
@aravindkarnam thanks for adding this to your milestone 🙏
There shouldn't be an issue using a proxy service that authenticates with just an API key right?
import requests
proxies = {
"http": f"http://{PROXY_API_KEY}:@proxy.zyte.com:8011",
"https": f"http://{PROXY_API_KEY}:@proxy.zyte.com:8011",
}
response = requests.get("https://google.com", proxies=proxies, verify=False)
print(response.status_code) # works as expected, 200 response
@JWBWork Can you try once and see if that works. Also give me an example of such proxy provider(using API Key for auth instead of username & password), so I can try that out as well.
@aravindkarnam my company uses this service for proxy https://www.zyte.com/ - only requires an API key
And to clarify the example I gave above works for me, I can authenticate and get a 200 response when I just use requests
Apologies for the delayed response, just got back from vaca 🏖️