Cristÿ Constantin issues

Results 8 issues of


                                            Cristÿ Constantin

Ignore empty OpenGraph props

Ignore empty OpenGraph properties or content. Fixes #117 Upgraded six>=1.11 to make Python 3.4 tests pass. This was also done in https://github.com/scrapinghub/extruct/pull/119

Snapshot CSS background-image URL as base64

The quest for embedding all resources in the snapshot continues. See: https://github.com/rrweb-io/rrweb/issues/737 - saving images offline as base64 Currently there's no way that I know of, to include the CSS...

Added scripts for saving and viewing snapshots

Hi. I wanted to share these 2 useful scripts that I actually use. They can showcase how to use rrWeb, and the script for viewing snapshots is really useful in...

I can't go past setting the profile

Hello, i love the idea of your app and i really want to see it work! I have IPFS, i tried version 0.3.10, but your app needs the latest. So...

bug

API

Implemented date filter rules, specified as spider arg

See issue https://github.com/scrapinghub/autoextract-spiders/issues/6 Usage: > scrapy crawl articles -a seeds=... -a dates=2019-11 ... Or a list of dates: > scrapy crawl articles -a seeds=... -a dates=['2019-09', '2019-10'] ... Any rule...

Better de-duplication of URLs

When discovering URLs from different seeds, the URLs are not deduplicated if they are found in multiple seeds. There is local de-duplication during discovery, and there's also the built-in DupeFilters....

Filter extracted articles by date

It's useful to expose a param in the spider to only keep articles that match a certain date. This could be as simple as a regex, to match agains the...

enhancement

Issue parsing ZH-Hant locale

Hi. This doesn't work: `dateparser.parse('2020年9月1日下午6:25', languages=['zh-Hant'])` This works: `dateparser.parse('2020年9月1日下午6:25', languages=['zh'])` This also works: `dateparser.parse('2020年9月1日下午6:25', languages=['zh-Hant', 'zh'])` It's weird because I can see all the info in https://github.com/scrapinghub/dateparser/blob/master/dateparser_data/cldr_language_data/date_translation_data/zh-Hant.json...