core icon indicating copy to clipboard operation
core copied to clipboard

You cant run multiple spiders

Open blackhood5678 opened this issue 1 year ago • 3 comments

If you have build multiple spiders and try to run them together it creates concurency issues. Assume you have a list of spider classes foreach ($spiders as $spider) { Roach::startSpider($spider); } I would expect each spider to be run once however the first spider is run once, the second 2 times, the 3rd spider 3 times and so on.... Maybe Im doing something stupid or maybe this isnt how im suppose to run multiple spiders idk what the isssue is but ive been looking at the code and I have a suspision it has to do with how the engine starts new runbut im not sure.

Package versions

  • core: [3.0.0]

blackhood5678 avatar May 24 '24 12:05 blackhood5678

I have the same issue. I'm running 1 spider inside of a foreach loop. Because of that, I see multiple, duplicate requests being made. So by the 10th loop, I have 10 spiders running. Those 10 spiders seem to have the corresponding index amount of startUrls. So on the 10th loop, I have 10 spiders, spider 1 of 10 requests the link once, spider 2 of 10 requests the link twice, the third spider requests the link 3 times, etc etc. Even the RequestDeduplicationMiddleware doesn't seem to do anything.

I noticed if I start 2 different Spider classes, even with 2 separate set of URLs, multiple requests are made. So it seems every time a Roach::startSpider() is called, a new spider is created, but will listen to any overrides, such as startUrls.

claytongray avatar Jun 27 '24 01:06 claytongray

Duplicate/related: #36

Work around is to run spider jobs separately via Laravel queues or another way of "forking" into different PHP processes.

joelmellon avatar Sep 01 '24 23:09 joelmellon

Quick&dirty workaround/hack I've used was:

  1. patch Roach.php (add function to Roach class):
    public static function killSpider(): void
    {
        self::$container = null;
    }
    
  2. Then after each spider run, call it:
    Roach::startSpider(FirstSpider::class);
    Roach::killSpider();
    Roach::startSpider(SecondSpider::class);
    Roach::killSpider();
    

mattheobjornson avatar Jan 21 '25 20:01 mattheobjornson