Document Guzzle options for handling errors & timeouts
Whether using symfony or simplehtmldom, if a timeout is set, and the page times outs, it throws and exception and everything stops.
is there a way to suppress exceptions?
The using() method takes a second parameter, where you can specify Guzzle request options (or if you're in PHP-land, pass in your own Guzzle Client):
{% set crawler = craft.scraper.using('symfony', {
http_errors: false,
timeout: 10,
}).get('https://zombo.com') %}
or
$crawler = Scraper::getInstance()->scraper->using('symfony', $myTweakedGuzzleClient)->get('https://zombo.com');
I'll add a note about this āš¼ to the readme.
Cheers Michael,
I managed to figure it out earlier :-)
The problem iām having now however is that if a remote source exceeds the timeout, theres no handling for that and everything halts with an exception. trying to see if i can do something with this.
cheers
mike
On 24 Aug 2021, at 16:21, Michael Rog @.***> wrote:
The using() method takes a second parameter, where you can specify Guzzle request options https://docs.guzzlephp.org/en/6.5/request-options.html (or if you're in PHP-land, pass in your own Guzzle Client):
{% set crawler = craft.scraper.using('symfony', { http_errors: false, timeout: 10, }).get('https://zombo.com') %} or
$crawler = Scraper::getInstance()->scraper->using('symfony', $myTweakedGuzzleClient)->get('https://zombo.com'); I'll add a note about this āš¼ to the docs.
ā You are receiving this because you authored the thread. Reply to this email directly, view it on GitHub https://github.com/TopShelfCraft/Scraper/issues/3#issuecomment-904738205, or unsubscribe https://github.com/notifications/unsubscribe-auth/AKZL4JPMXO7XVBFYAUOUJXLT6O2IDANCNFSM5CWZKMLA.
The http_errors option may help with that. š¤š¼
unfortunately not, i tried that, seems to only help with 400/500 status codes, but if the server doesn't respond with a code, it hangs.
Hmmmm... Can you try specifying a shorter timeout duration on your Guzzle client, and disabling http_errors?
The default setting for Guzzle is timeout => 0, i.e. Guzzle will wait indefinitely for the server to return. So you may be bumping into PHP's timeout (or Craft's, or the web server's). What we want is for Guzzle to hit its internal timeout and throw a 408, which http_errors can suppress.
ive tried a timeout of 1, a good example is tesco.com http://tesco.com/, it seems to hang for anything other than standard browsers. I set the timeout to 1 to avoid waiting for sites such as this hanging eveything up.
I have made a try/catch mod on the attached files in order to make a quick fix, maybe something that could be updated on the repo?
cheers
mike
On 24 Aug 2021, at 16:39, Michael Rog @.***> wrote:
Hmmmm... Can you try specifying a shorter timeout duration on your Guzzle client, and disabling http_errors?
The default setting for Guzzle is timeout => 0, i.e. Guzzle will wait indefinitely for the server to return. So you may be bumping into PHP's timeout (or Craft's, or the web server's). What we want is for Guzzle to hit its internal timeout and throw a 408, which http_errors can suppress.
ā You are receiving this because you authored the thread. Reply to this email directly, view it on GitHub https://github.com/TopShelfCraft/Scraper/issues/3#issuecomment-904752919, or unsubscribe https://github.com/notifications/unsubscribe-auth/AKZL4JLDBLHQIUSD2UNV7S3T6O4MHANCNFSM5CWZKMLA.
Hi Michael,
sorry to bother you but im trying to use your plugin however im still having problems.
this is my code:
{% set client = {base_uri : 'http://360coupons.com', http_errors : false, allow_redirects : false, timeout: 3} %}
{% set crawler = craft.scraper.using('symfony', client).get(client.base_uri) %}
{% if crawler %} {{ crawler.filter('title').text() }} {% endif %}
however it doesnt appear to be taking any notice of the guzzle options, ive set no redirects but its still getting redirected.
any ideas?
thanks
mike
On 24 Aug 2021, at 16:21, Michael Rog @.***> wrote:
The using() method takes a second parameter, where you can specify Guzzle request options https://docs.guzzlephp.org/en/6.5/request-options.html (or if you're in PHP-land, pass in your own Guzzle Client):
{% set crawler = craft.scraper.using('symfony', { http_errors: false, timeout: 10, }).get('https://zombo.com') %} or
$crawler = Scraper::getInstance()->scraper->using('symfony', $myTweakedGuzzleClient)->get('https://zombo.com'); I'll add a note about this āš¼ to the docs.
ā You are receiving this because you authored the thread. Reply to this email directly, view it on GitHub https://github.com/TopShelfCraft/Scraper/issues/3#issuecomment-904738205, or unsubscribe https://github.com/notifications/unsubscribe-auth/AKZL4JPMXO7XVBFYAUOUJXLT6O2IDANCNFSM5CWZKMLA.
the client options seem to work with simplehtmldom, but not with symfony