crawlee-python icon indicating copy to clipboard operation
crawlee-python copied to clipboard

Crawlee—A web scraping and browser automation library for Python to build reliable crawlers. Extract data for AI, LLMs, RAG, or GPTs. Download HTML, PDF, JPG, PNG, and other files from websites. Works...

Results 229 crawlee-python issues
Sort by recently updated
recently updated
newest added

### Description - declare private and public interface ### Issues - N/A ### Testing - N/A ### Checklist - [x] CI passed

t-tooling
adhoc
debt

Optimize performance by skipping unnecessary `update_request()` calls in `RequestQueue.reclaim_request()` https://github.com/apify/apify-sdk-python/blob/v1.3.0/src/apify/storages/request_queue.py#L314:318

t-tooling
debt

Simulate this error in Python and handle it accordingly. ```js try { return await this.client.listItems(options); } catch (e) { const error = e as Error; if (error.message.includes('Cannot create a string...

t-tooling
debt

if user_data_dir option is found in browser_option, then the launch function used is launch_persistent_context instead of launch and the user_data_dir option is passed to playwright ### Description it makes possible...

t-tooling

https://crawlee.dev/python/docs/introduction/saving-data#using-a-context-helper should put emphasis on using the `push_data` helper, `Dataset.open().push_data()` should only be mentioned later in the article

documentation
t-tooling

### Description - `item_count` unexpected increment when loaded from metadata ### Issues - Closes: #442 ### Testing - Added `test_reuse_dataset` test ### Checklist - [ ] CI passed

t-tooling

### Which package is the feature request for? If unsure which one to select, leave blank @crawlee/playwright (PlaywrightCrawler) ### Feature Please add support for using user provided browser profile ###...

enhancement
t-tooling

When reusing a dataset with metadata, `item_count` is incremented after being loaded from the metadata file. It leads to non continuous file increments, and breaks multiple functions on Datasets (export...

bug
t-tooling

Hello, I'm experiencing performance issues with my web crawler after approximately 1.5 to 2 hours of runtime. The crawling speed significantly decreases to about one site per minute or less,...

bug
t-tooling