purl
purl copied to clipboard
Add resolve() method (or function)
Here's the helper function I wrote:
from purl import URL
from urllib import parse
def resolve(base, url):
"""
Resolves a target URL relative to a base URL in a manner similar to that of a Web browser resolving an anchor tag HREF
:param base: str|URL
:param url: str|URL
:return: URL
"""
if isinstance(base, URL):
baseurl = base
else:
baseurl = URL(base)
if isinstance(url, URL):
relurl = url
else:
relurl = URL(url)
if relurl.host():
return relurl
if relurl.path():
return URL(
scheme=baseurl.scheme(),
host=baseurl.host(),
port=baseurl.port(),
path=parse.urljoin(baseurl.path(), relurl.path()),
query=relurl.query(),
fragment=relurl.fragment(),
)
elif relurl.query() or '?' in url:
return URL(
scheme=baseurl.scheme(),
host=baseurl.host(),
port=baseurl.port(),
path=baseurl.path(),
query=relurl.query(),
fragment=relurl.fragment(),
)
elif relurl.fragment() or '#' in url:
return URL(
scheme=baseurl.scheme(),
host=baseurl.host(),
port=baseurl.port(),
path=baseurl.path(),
query=baseurl.query(),
fragment=relurl.fragment(),
)
return baseurl
Usage:
>>> base = URL('http://user:[email protected]:8080/path/to/some/doc.html?q=query#frag')
...
>>> print(resolve(base, '../home'))
http://user:[email protected]:8080/path/to/home
>>> print(resolve(base, 'doc2.html'))
http://user:[email protected]:8080/path/to/some/doc2.html
>>> print(resolve(base, '?'))
http://user:[email protected]:8080/path/to/some/doc.html
>>> print(resolve(base, '?q=git'))
http://user:[email protected]:8080/path/to/some/doc.html?q=git
>>> print(resolve(base, '#'))
http://user:[email protected]:8080/path/to/some/doc.html?q=query
Am not sure about this one. I can see how the ../home request translates to a path change but values like ? already have an API in purl (setting the query string to an empty string).
Have you seen an API like this in another URL lib?
Yes. url module in NodeJS: https://www.npmjs.com/package/url
resolve() is more than just urllib.parse.urljoin(). It resolves a target URL relative to a base URL in a manner similar to that of a Web browser resolving an anchor tag HREF.
Examples:
-
<a href="../home">Go home</a>=> changepath+ clearquery+ clearfragment. -
<a href="?">Page 1</a>=> clearquery+ clearfragment.
Very useful for HTML parsing.
Interesting. I'll add a similar method.