url-normalize
url-normalize copied to clipboard
Normalize absolute urls with default domain name.
It should be possible to specify a default_scheme similar to the already possible default_scheme.
E.g.
>>> url_normalize('/foo.png', default_scheme='https', default_domain='example.com')
'https://example.com/foo.png'
Currently that is not possible:
>>> url_normalize('/foo.png', default_scheme='https')
'/foo.png'
This could be done with the following
def provide_url_scheme(url, default_scheme=DEFAULT_SCHEME, default_host=None):
has_scheme = ":" in url[:7]
is_universal_scheme = url.startswith("//")
is_stdout = url == "-"
is_file_path = is_stdout or (url.startswith("/") and not is_universal_scheme)
if default_host and is_file_path and not is_stdout:
if is_universal_scheme:
return default_scheme + ":" + default_host + url
return default_scheme + "://" + default_host + url
if not url or has_scheme or is_file_path:
return url
if is_universal_scheme:
return default_scheme + ":" + url
return default_scheme + "://" + url
All that has to be added to url_normalize(â¦) is:
def url_normalize(
- url, charset=DEFAULT_CHARSET, default_scheme=DEFAULT_SCHEME, sort_query_params=True
+ url, charset=DEFAULT_CHARSET, default_scheme=DEFAULT_SCHEME, default_host=None, sort_query_params=True
):
if not url:
return url
- url = provide_url_scheme(url, default_scheme)
+ url = provide_url_scheme(url, default_scheme, default_host)
Which could look like this in the end:
>>> url_normalize('/blahbla/whatever.png', default_host='example.com')
'https://example.com/blahbla/whatever.png'