Limiting URLs for testing - Make MAX_COUNT configurable via cli arguments

Open subins2000 opened this issue 7 years ago • 1 comments

I need to test the script to see whether it works. The extraction of date and headlines etc.

But it seems to download everything before the extraction part is done. It's been going around for more than 30 minutes now.

Is there a way to limit the crawled URLs so that I can make sure the script's working ? I don't have the resources to do an entire crawl.

I'm using MultiThreadedCrawler2

Apr 15 '19 15:04 subins2000

You can change the following line to have that effect. This should be made configurable. I will work on that

https://github.com/vanangamudi/newspaper-crawler-scripts/blob/master/crawler.py#L89

Apr 15 '19 23:04 vanangamudi