Ehsan U. issues

Results 13 issues of


                                            Ehsan U.

Bypass Datadome, that utilizing GeeTest

Datadome is currently utilizing Recaptcha v2 and GeeTest captchas. need solution for GeeTest

add support for remote firefox and webkit

#277

scrapy-playwright connect to remote firefox instance

Currently, **scrapy-playwright** only supports Chromium for connecting to remote browser instances over CDP (Chrome DevTools Protocol). Firefox is quite effective in bypassing detections against some anti-bot measures. Is there any...

enhancement

HTTP API for Spider

`Scrapy` offers an HTTP API through a third-party library called `ScrapyRT`, which exposes an HTTP API for spiders. By sending a request to `ScrapyRT` with the spider name and URL,...

enhancement

t-tooling

add support for Parsel

`BeautifulSoup` lacks proper type hints, mostly `Any` type, hence not effective IDE autocompletion. A solid alternative is [Parsel](https://github.com/scrapy/parsel). It supports CSS selectors, XPath expressions for HTML and XML, JMESPath for...

enhancement

t-tooling

Chained Requests

How `Crawlee` can be used when requests needs to be sent in sequence like in most `ASP.Net` applications. `Scrapy` handle these cases using inline requests without CALLBACK. e.g here couple...

t-tooling

Add rendering service to improve scalability

**Improvements**: - Removed Playwright from the `playwright-service` - The Browser instance (Playwright) can now run and scale independently, making it compatible with microservices architecture. - Support for ad blocking and...

[WARNING]. ⚠ Both crawler_config and legacy parameters provided. crawler_config will take precedence.

I didn't specify the `crawler_config`, still getting this warning! Only for `arun_many` method ![image](https://github.com/user-attachments/assets/2c49cb78-97fa-48cc-a01a-f324bd7b2783) Possible cause: ![image](https://github.com/user-attachments/assets/7c07fd43-ccc3-419c-abb0-17d8611747ea)

TimeoutError on getting page source

I tried the example and even that is not working, page was fully loaded. `page_source` is triggering the `TimeoutError` ```python import asyncio from pydoll.browser.chrome import Chrome async def main(): async...

Event Not Being Captured

Hi Thanks for such an amazing work ! I wonder why this snippet is not able to capture the event? ```python import asyncio from pydoll.browser.chrome import Chrome from pydoll.events.page import...