Peter Inglesby issues

Results 9 issues of


                                            Peter Inglesby

schemes that attempt to reload modules are blocked by the inspection registry

### Describe the bug There's a nasty interaction between the following: * some import-time caching in SQLAlchemy (in [`sqlalchemy.inspection._registrars`](https://github.com/sqlalchemy/sqlalchemy/blob/66be1482db06adb908432b2e3b41d9393d1319f7/lib/sqlalchemy/inspection.py#L53)), * the way Coverage.py handles editable installs, and * the way...

bug

orm

PRs (with tests!) welcome

external library/application issues

Add Keane to AUTHORS

@keaneokelley Can you give me a thumbs up/down if you'd like this merged?

Tests are flaky

Eg https://travis-ci.org/inglesp/http-crawler/jobs/285774955

Should be able to identify invalid URL schemes

#12 means that now we ignore URL schemes that cannot be handled by `requests`, but we should be able to identify mistyped URL schemes. See discussion in #6.

Allow user to choose how links are extracted from responses

We currently extract links from HTML by looking for `src` and `href` attributes, and from CSS by looking for `@import` rules and `URI` tokens. A user might want to extract...

Allow user to choose whether to follow redirects

We currently use `requests`'s default behaviour of following redirects. A user might not always want this, as they might want to use the library to find unnecessary redirects on a...

Allow user to choose which links to follow

We currently follow all links, but in some cases this might not be appropriate We should find a way to allow the user to configure which links to follow.

Allow user to choose which pages to extract links from

We currently extract links from all pages that are on the same domain as the original URL that is passed to `crawl`. This might be too narrow (for instance, a...

Restructure everything to use django-amber

I have kept the same directory structure for content, templates, and media.