grab icon indicating copy to clipboard operation
grab copied to clipboard

Web Scraping Framework

Results 16 grab issues
Sort by recently updated
recently updated
newest added

There are small typos in: - docs/grab/transport.rst - docs/spider/intro.rst - docs/spider/task.rst - docs/spider/transport.rst - docs/usage/installation.rst - grab/base.py - grab/spider/queue_backend/redis.py Fixes: - Should read `access` rather than `acess`. - Should read...

``` /lib/python3.10/site-packages/grab/spider/base_service.py", line 64, in is_alive return self.thread.isAlive() AttributeError: 'Thread' object has no attribute 'isAlive'. Did you mean: 'is_alive'? ``` Python 3.10, but I saw same problem in 3.9

Content of `grab.response.head in the moment of error happened: ``` b'HTTP/1.1 200 OK\r\nDate: Tue, 13 Jun 2017 22:16:36 GMT\r\nServer: Apache\r\nSet-Cookie: \xb3\xd2\xda\xcd\xd7=%96%A6g%9Ay%B0%A5g%A7tm%7C%95%9A; expires=Tue, 25-Jul-2017 14:16:36 GMT; path=/\r\nX-Powered-By: Apache2\r\nVary: Accept-Encoding\r\nContent-Encoding: gzip\r\nContent-Length: 4974\r\nContent-Type:...

bug
cookies

Affected file: [grab/document.py](https://github.com/lorien/grab/blob/master/grab/document.py#L38) ``` >>> import libgenapi ... /usr/local/lib/python3.9/site-packages/grab/document.py:35: DeprecationWarning: defusedxml.lxml is no longer supported and will be removed in a future release. import defusedxml.lxml ``` The defusedxml.lxml subpackage will...

bug

I want to configure CURLOPT_RESOLVE to specific IP address, so in create_grab_instance() I wrote: ``` ... g.setup_transport('pycurl') g.transport.curl.setopt(pycurl.RESOLVE, ['api.somesite.com:443:{}'.format(ip)]) return g ``` When I call spider.run(), I get the following...

bug

Sometimes there is a real need to send header filed with empty value. Example [here](https://curl.haxx.se/libcurl/c/httpcustomheader.html) explains how to do that.

bug

It works on my dev Debian machine. It fails in github ubunti CI environemnt. ```python @only_grab_transport("pycurl") def test_different_domains(self): import pycurl # pylint: disable=import-outside-toplevel grab = build_grab() names = [ "foo:%d:127.0.0.1"...

bug
cookies
domain name resolving

The documentation URL redirects to the old page. Now it will redirect to the new one