Integrate parts of Scrapy CrawlSpider Rule flexibility into DDS
With CrawlSpider Rules there exists a more powerful tool in Scrapy (http://doc.scrapy.org/en/0.14/topics/spiders.html) to crawl pages from different urls following a certain pattern than it is actually realized in DDS with pagination.
See the following Google Groups discussion thread for reference: https://groups.google.com/forum/?fromgroups#!topic/django-dynamic-scraper/tQJMpcbqbfc
It would be desirable to integrate at least a part of it.
Ideas:
- Application of one '"allow"-Rule could be integrated as a pagination type together with the pagination_append_str attribute without changing the DB structure
we may inherit the CrawlSpider, just as the similar way that you implement the DjangoBaseSpider by inheriting from BaseSpider?
how do you think about that?
I think, it should be no problem to replace BaseSpider with CrawlSpider, but that still leaves the task to integrate some of its functionality into DDS in an appropriate way. Or would this replacement already help you in some way?
Is there still interest in this feature? My team might want to develop this feature.
Definitely still interesting. Before you start developing it would be definitely good/helpful if you lay out here how you would implement this feature and how it fits in the existing DDS structure, regarding code, DB and admin UI.
It is also a prerequisite for a new feature to be accepted that all the unit tests pass, see: http://django-dynamic-scraper.readthedocs.org/en/latest/development.html#running-the-test-suite
If you want to make a pull request ping me before issuing, I would create a separate experimental branch for merging.
Cheers Holger