core icon indicating copy to clipboard operation
core copied to clipboard

Requests with ExecuteJavascriptMiddleware are sent twice

Open Fadarrizz opened this issue 1 year ago • 1 comments

Describe the bug When using the ExecuteJavascriptMiddleware, two requests are sent. One by Browsershot, the other by Guzzle.

Reproduction My spider only has the ExecuteJavascriptMiddleware registered as downloader middleware.

I placed dumps with the request object id in the following places, just before requests are sent:

RoachPHP\Http\Client image RoachPHP\Downloader\Middleware\ExecuteJavascriptMiddleware image

When the spider runs, both dumps are shown: image

Expected behavior I was expecting only one request being sent: one from Browsershot, not also one from Guzzle.

Package versions (please complete the following information):

  • core: v3.0.1

Fadarrizz avatar Apr 09 '24 08:04 Fadarrizz

Since the ExecuteJavascriptMiddleware handles a response, it's called once a response is received. So, the http client has already sent the request before it reaches the middleware.

My feeling is that the middleware should be able to handle the request before it is handled by something else.

Looking at how Scrapy does this, a downloader middleware can process requests and can return a response. When a response is returned by the middleware, no other request processing is done, only response processing. https://docs.scrapy.org/en/latest/topics/downloader-middleware.html#scrapy.downloadermiddlewares.DownloaderMiddleware.process_request

@ksassnowski What do you think is a suitable solution for this?

Fadarrizz avatar Apr 09 '24 12:04 Fadarrizz