django-dynamic-scraper icon indicating copy to clipboard operation
django-dynamic-scraper copied to clipboard

Integrate parts of Scrapy CrawlSpider Rule flexibility into DDS

Open holgerd77 opened this issue 13 years ago • 4 comments

With CrawlSpider Rules there exists a more powerful tool in Scrapy (http://doc.scrapy.org/en/0.14/topics/spiders.html) to crawl pages from different urls following a certain pattern than it is actually realized in DDS with pagination.

See the following Google Groups discussion thread for reference: https://groups.google.com/forum/?fromgroups#!topic/django-dynamic-scraper/tQJMpcbqbfc

It would be desirable to integrate at least a part of it.

Ideas:

  • Application of one '"allow"-Rule could be integrated as a pagination type together with the pagination_append_str attribute without changing the DB structure

holgerd77 avatar Aug 06 '12 20:08 holgerd77

we may inherit the CrawlSpider, just as the similar way that you implement the DjangoBaseSpider by inheriting from BaseSpider?

how do you think about that?

kevinwan avatar Aug 07 '12 17:08 kevinwan

I think, it should be no problem to replace BaseSpider with CrawlSpider, but that still leaves the task to integrate some of its functionality into DDS in an appropriate way. Or would this replacement already help you in some way?

holgerd77 avatar Aug 12 '12 10:08 holgerd77

Is there still interest in this feature? My team might want to develop this feature.

heysamtexas avatar Aug 12 '15 09:08 heysamtexas

Definitely still interesting. Before you start developing it would be definitely good/helpful if you lay out here how you would implement this feature and how it fits in the existing DDS structure, regarding code, DB and admin UI.

It is also a prerequisite for a new feature to be accepted that all the unit tests pass, see: http://django-dynamic-scraper.readthedocs.org/en/latest/development.html#running-the-test-suite

If you want to make a pull request ping me before issuing, I would create a separate experimental branch for merging.

Cheers Holger

holgerd77 avatar Aug 12 '15 13:08 holgerd77