crawlee-python issues

refactor!: declare private and public interface

### Description - declare private and public interface ### Issues - N/A ### Testing - N/A ### Checklist - [x] CI passed

t-tooling

adhoc

debt

Optimize performance by skipping unnecessary `update_request()` calls in `RequestQueue.reclaim_request()`

Optimize performance by skipping unnecessary `update_request()` calls in `RequestQueue.reclaim_request()` https://github.com/apify/apify-sdk-python/blob/v1.3.0/src/apify/storages/request_queue.py#L314:318

vdusek

t-tooling

debt

Improve error handling in `Dataset._get_data_internal()`

Simulate this error in Python and handle it accordingly. ```js try { return await this.client.listItems(options); } catch (e) { const error = e as Error; if (error.message.includes('Cannot create a string...

vdusek

t-tooling

debt

feat: support custom profile in playwright

4

if user_data_dir option is found in browser_option, then the launch function used is launch_persistent_context instead of launch and the user_data_dir option is passed to playwright ### Description it makes possible...

sherpya

t-tooling

Improve saving-data guide

2

https://crawlee.dev/python/docs/introduction/saving-data#using-a-context-helper should put emphasis on using the `push_data` helper, `Dataset.open().push_data()` should only be mentioned later in the article

janbuchar

documentation

t-tooling

Document the difference between `add_requests` and `enqueue_links` context_helpers

Based on discussion in #347

janbuchar

documentation

t-tooling

fix: item_count double incremented

### Description - `item_count` unexpected increment when loaded from metadata ### Issues - Closes: #442 ### Testing - Added `test_reuse_dataset` test ### Checklist - [ ] CI passed

cadlagtrader

t-tooling

support for launching with user provided profile

### Which package is the feature request for? If unsure which one to select, leave blank @crawlee/playwright (PlaywrightCrawler) ### Feature Please add support for using user provided browser profile ###...

sherpya

enhancement

t-tooling

item_count double incremented when reloading dataset

1

When reusing a dataset with metadata, `item_count` is incremented after being loaded from the metadata file. It leads to non continuous file increments, and breaks multiple functions on Datasets (export...

cadlagtrader

bug

t-tooling

Crawling very slow and timeout error

13

Hello, I'm experiencing performance issues with my web crawler after approximately 1.5 to 2 hours of runtime. The crawling speed significantly decreases to about one site per minute or less,...

Jourdelune

bug

t-tooling

crawlee-python
crawlee-python copied to clipboard

Metadata

refactor!: declare private and public interface

Optimize performance by skipping unnecessary `update_request()` calls in `RequestQueue.reclaim_request()`

Improve error handling in `Dataset._get_data_internal()`

feat: support custom profile in playwright

Improve saving-data guide

Document the difference between `add_requests` and `enqueue_links` context_helpers

fix: item_count double incremented

support for launching with user provided profile

item_count double incremented when reloading dataset

Crawling very slow and timeout error

← Metadata

Owner

Metadata

crawlee-python crawlee-python copied to clipboard

Metadata

← Metadata

Owner

Metadata

crawlee-python
crawlee-python copied to clipboard