crawler icon indicating copy to clipboard operation
crawler copied to clipboard

Web Scraping Framework

Results 4 crawler issues
Sort by recently updated
recently updated
newest added

PycURL 7.43.0.4 contains a fix for Python >= 3.8 related to a deprecation warning. In python 3.10, this became unusable with error thrown. - SystemError: PY_SSIZE_T_CLEAN macro must be defined...

https://github.com/lorien/crawler/blob/master/crawler/base.py#L86 `init_hook` method called inside `__init__` So for instance if you're doing some work (db calls etc.) inside `init_hook` there is no way to change class attributes before calling init...

Would be nice to have something like pre/post request hooks, for instance to detect if request is banned by host, page does not match defined rules, to save some counters...

Implement cache backends like for grab. Sometimes it's really useful.