Botasaurus can't pass CF
The CF seems that can detect the Botosaurus. There is no IP banned, there is no OS related problem. I have the same behavior on windows 11 and on ubuntu server.
If i emit the "wait" parameter, i get different error (like the "id" not found)
The script:
from botasaurus.browser import browser, Driver
@browser(add_arguments=['--no-sandbox'])
def scrape_heading_task(driver: Driver, data):
# Visit the Omkar Cloud website
driver.google_get("https://gitlab.com/users/sign_in", bypass_cloudflare=True, wait=10)
# Retrieve the heading element's text
heading = driver.get_text("h1")
# Save the data as a JSON file in output/scrape_heading_task.json
return {
"heading": heading
}
# Initiate the web scraping task
scrape_heading_task()
the log:
Traceback (most recent call last):
File "/root/.venv/lib/python3.12/site-packages/botasaurus/browser_decorator.py", line 176, in run_task
result = func(driver, data)
^^^^^^^^^^^^^^^^^^
File "/root/pricecheckgrbots/delete.py", line 6, in scrape_heading_task
driver.google_get("https://gitlab.com/users/sign_in", bypass_cloudflare=True, wait=10)
File "/root/.venv/lib/python3.12/site-packages/botasaurus_driver/driver.py", line 536, in google_get
self.get_via(link, "https://www.google.com/", bypass_cloudflare=bypass_cloudflare, wait=wait)
File "/root/.venv/lib/python3.12/site-packages/botasaurus_driver/driver.py", line 522, in get_via
self.detect_and_bypass_cloudflare()
File "/root/.venv/lib/python3.12/site-packages/botasaurus_driver/driver.py", line 878, in detect_and_bypass_cloudflare
bypass_if_detected(self)
File "/root/.venv/lib/python3.12/site-packages/botasaurus_driver/solve_cloudflare_captcha.py", line 122, in bypass_if_detected
wait_till_cloudflare_leaves(driver, previous_ray_id, raise_exception)
File "/root/.venv/lib/python3.12/site-packages/botasaurus_driver/solve_cloudflare_captcha.py", line 64, in wait_till_cloudflare_leaves
raise CloudflareDetectionException()
botasaurus_driver.exceptions.CloudflareDetectionException: Cloudflare has detected us.
I tried out your code in my ubuntu system. works fine for me. If no luck probably try this out
from botasaurus.browser import browser, Driver
import time
@browser(add_arguments=['--no-sandbox'])
def scrape_heading_task(driver: Driver, data):
driver.google_get("https://gitlab.com/users/sign_in")
time.sleep(2)
iframe = driver.select_iframe("#turnstile-wrapper iframe")
checkbox = iframe.select('label', None)
if checkbox:
checkbox.click()
driver.prompt()
driver.save_screenshot()
heading = driver.get_text("h1")
return heading
# Initiate the web scraping task
scrape_heading_task()
If necessary you might have to use proxies to access the site.
Still not working at ubuntu server (no gui).
I have the same IP as my windows machine. At Windows the script working without any problems.
At linux i tryied this:
from botasaurus.browser import browser, Driver
import time
@browser(add_arguments=['--no-sandbox'])
def scrape_heading_task(driver: Driver, data):
driver.google_get("https://gitlab.com/users/sign_in")
time.sleep(10)
iframe = driver.select_iframe("#turnstile-wrapper iframe")
driver.save_screenshot()
checkbox = iframe.select('label', None)
if checkbox:
print("detected checkbox")
checkbox.click()
time.sleep(1)
driver.save_screenshot()
driver.prompt()
driver.save_screenshot()
heading = driver.get_text("h1")
return heading
# Initiate the web scraping task
scrape_heading_task()
Seems that the checkbox isn't clicked (at second screenshot). If I increase the timeout from 10 to 30, the turntile disappeared!
Resolved
-
Updates packages python -m pip install bota botasaurus botasaurus-api botasaurus-requests botasaurus-driver bota botasaurus-proxy-authentication botasaurus-server botasaurus-humancursor --upgrade
-
Run
from botasaurus.browser import browser, Driver
@browser
def scrape_heading_task(driver: Driver, data):
driver.google_get("https://nopecha.com/demo/cloudflare", bypass_cloudflare=True)
driver.prompt()
scrape_heading_task()