core icon indicating copy to clipboard operation
core copied to clipboard

Scraping and crawling with Laravel Dusk

Open clarkewing opened this issue 3 years ago • 4 comments

Hello!

Awesome idea crafting this project, I'm really looking forward to using it when scraping data.

Some websites rely on Javascript heavily and require interactivity to reach certain pieces of information. Is there any way of using something like Laravel Dusk's interactivity features with Roach?

clarkewing avatar Feb 07 '22 17:02 clarkewing

Looks what you're looking for is the ExcecuteJavascriptMiddleware https://roach-php.dev/docs/downloader-middleware/#executing-javascript

ksassnowski avatar Feb 09 '22 05:02 ksassnowski

Hi @ksassnowski, thanks for the reply.

I believe my use case differs as the idea isn't to collect data from a static page, but rather to navigate on the page. Allow me to explain:

Essentially, I'm trying to scrape data from an interactive form which presents questions one at a time. This requires filling out questions and selecting responses to move forward through the form on the same page, using buttons which trigger javascript.

Currently, I'm doing this with Laravel Dusk and running a test which makes no assertions but exports scraped data to a database. It's not a perfect system, but it works. Seeing how promising Roach looks, I was hoping I'd be able to implement this through a Dusk implementation in Roach. Do you have any plans to support such functionality?

Thanks for the awesome work!

clarkewing avatar Feb 14 '22 12:02 clarkewing

I'm searching for some similar functionality to be able to scroll down to load some data from the infinite scroll. @ksassnowski, Do you have some suggestions?

joskfg avatar Mar 27 '22 17:03 joskfg

ExcecuteJavascriptMiddleware

I am wanting to use this Middleware but it is not firing. I had an issue as I am using Laravel Sail on an M1 Mackbook and installing puppeteer had issues due to chromium not being arm64 ready, so I did the following.

I installed spatie/browsershot and ran

sail PUPPETEER_EXPERIMENTAL_CHROMIUM_MAC_ARM=1 npm i puppeteer

It then seemed to install but the ExcecuteJavascriptMiddleware is not being called so I still get the:

<noscript>You need to enable JavaScript to run this app.</noscript>\n

version of the page returned.

am I doing something wrong?

andyscraven avatar Jul 26 '22 02:07 andyscraven

I'm closing this for now as I consider a Dusk integration to be out of scope of this library.

ksassnowski avatar Sep 22 '22 10:09 ksassnowski