[Bug]: Unable to scrape Cloudfare protected sites
crawl4ai version
0.4.248
Expected Behavior
It should be able to scrape sites by bypassing cloudfare , but its unable to do it
Current Behavior
It is unable to bypass the sites protected by cloudfare Example: https://food52.com/
Is this reproducible?
Yes
Inputs Causing the Bug
Steps to Reproduce
Code snippets
OS
Linux
Python version
3.11
Browser
Chrome
Browser version
No response
Error logs & Screenshots (if applicable)
No response
@complete-dope Can you share your code snippet or what CrawlerRunConfig, and BrowserConfig with which you are trying to scrape this site?
I'm running crawl4ai in the docker container, so I am simply sending it to the docker container to chew on.
I am using the default configuration for scraping
@complete-dope @BarryBahrami
Hey! I just tested crawling the website, and it worked fine, both with and without Docker. Could you share the config you used so we can compare?
The code I run:
import requests
async def call():
# Configuration objects converted to the required JSON structure
browser_config_payload = {
"type": "BrowserConfig",
"params": {"headless": True}
}
crawler_config_payload = {
"type": "CrawlerRunConfig",
"params": {"stream": False, "cache_mode": "bypass"} # Use string value of enum
}
crawl_payload = {
"urls": ["https://food52.com/"],
"browser_config": browser_config_payload,
"crawler_config": crawler_config_payload
}
response = requests.post(
"http://localhost:11235/crawl", # Updated port
# headers={"Authorization": f"Bearer {token}"}, # If JWT is enabled
json=crawl_payload
)
print(f"Status Code: {response.status_code}")
if response.ok:
print(response.json())
else:
print(f"Error: {response.text}")
if __name__ == "__main__":
asyncio.run(call())
Also, try running your test again using the latest version of the library and Docker image. We released a new version just a few days ago.
Here are the tutorial videos in case they help:
Library: https://youtu.be/xo3qK6Hg9AA?feature=shared Docker: https://youtu.be/RwT1MlRfbrA?feature=shared
I'll close this issue, but feel free to continue the conversation and tag me if the issue persists with our latest version: 0.7.7.