Cristÿ Constantin
Cristÿ Constantin
Ignore empty OpenGraph properties or content. Fixes #117 Upgraded six>=1.11 to make Python 3.4 tests pass. This was also done in https://github.com/scrapinghub/extruct/pull/119
The quest for embedding all resources in the snapshot continues. See: https://github.com/rrweb-io/rrweb/issues/737 - saving images offline as base64 Currently there's no way that I know of, to include the CSS...
Hi. I wanted to share these 2 useful scripts that I actually use. They can showcase how to use rrWeb, and the script for viewing snapshots is really useful in...
Hello, i love the idea of your app and i really want to see it work! I have IPFS, i tried version 0.3.10, but your app needs the latest. So...
See issue https://github.com/scrapinghub/autoextract-spiders/issues/6 Usage: > scrapy crawl articles -a seeds=... -a dates=2019-11 ... Or a list of dates: > scrapy crawl articles -a seeds=... -a dates=['2019-09', '2019-10'] ... Any rule...
When discovering URLs from different seeds, the URLs are not deduplicated if they are found in multiple seeds. There is local de-duplication during discovery, and there's also the built-in DupeFilters....
It's useful to expose a param in the spider to only keep articles that match a certain date. This could be as simple as a regex, to match agains the...
Hi. This doesn't work: `dateparser.parse('2020年9月1日 下午6:25', languages=['zh-Hant'])` This works: `dateparser.parse('2020年9月1日 下午6:25', languages=['zh'])` This also works: `dateparser.parse('2020年9月1日 下午6:25', languages=['zh-Hant', 'zh'])` It's weird because I can see all the info in https://github.com/scrapinghub/dateparser/blob/master/dateparser_data/cldr_language_data/date_translation_data/zh-Hant.json...