newspaper-crawler-scripts
newspaper-crawler-scripts copied to clipboard
Limiting URLs for testing - Make MAX_COUNT configurable via cli arguments
I need to test the script to see whether it works. The extraction of date and headlines etc.
But it seems to download everything before the extraction part is done. It's been going around for more than 30 minutes now.
Is there a way to limit the crawled URLs so that I can make sure the script's working ? I don't have the resources to do an entire crawl.
I'm using MultiThreadedCrawler2
You can change the following line to have that effect. This should be made configurable. I will work on that
https://github.com/vanangamudi/newspaper-crawler-scripts/blob/master/crawler.py#L89