status-crawler icon indicating copy to clipboard operation
status-crawler copied to clipboard

--required-values spiders ALL

Open yarekc opened this issue 9 years ago • 1 comments

casperjs --start-url=http://www.proxymis.com --required-values=proxymis.com spider.js

does spider links that does not contain the url like:

200 http://www.google-analytics.com/ga.js 200 http://fonts.gstatic.com/s/economica/v4/UK4l2VEpwjv3gdcwbwXE9InF5uFdDttMLvmWuJdhhgs.ttf 200 http://fonts.gstatic.com/s/economica/v4/jObgDQiPUtmACAaaK3pMG6CWcynf_cDxXwCLxiixG1c.ttf 200 http://fonts.gstatic.com/s/lato/v11/v0SdcGFAl2aezM9Vq_aFTQ.ttf 200 http://fonts.gstatic.com/s/lato/v11/nj47mAZe0mYUIySgfn0wpQ.ttf 200 http://connect.facebook.net/fr_FR/all.js#xfbml=1

Shouldn't it ONLY spider resources that contain the required-values parameter ?

yarekc avatar Oct 19 '16 15:10 yarekc

Hi,

the required-values parameters is used to determine which url to open. In your case, it will follow all links with "proxymis.com".

In order to simulate real user, it loads all the related resources from the page. (css, js, ajax call on page load, etc.) So, you can see : file not found, errors..

For your issue, maybe the solution could be to add a new option from the command line to specify resource to skip. When the page requests a resource, we could add something similar to http://stackoverflow.com/a/22274345/4463145 to abort the request.

It would prevent to send data to google analytics stats..

doxakis avatar Oct 21 '16 17:10 doxakis