purl icon indicating copy to clipboard operation
purl copied to clipboard

Add resolve() method (or function)

Open pix666 opened this issue 7 years ago • 3 comments

Here's the helper function I wrote:

from purl import URL
from urllib import parse

def resolve(base, url):
    """
    Resolves a target URL relative to a base URL in a manner similar to that of a Web browser resolving an anchor tag HREF
    :param base: str|URL
    :param url: str|URL
    :return: URL
    """
    if isinstance(base, URL):
        baseurl = base
    else:
        baseurl = URL(base)

    if isinstance(url, URL):
        relurl = url
    else:
        relurl = URL(url)

    if relurl.host():
        return relurl

    if relurl.path():
        return URL(
            scheme=baseurl.scheme(),
            host=baseurl.host(),
            port=baseurl.port(),
            path=parse.urljoin(baseurl.path(), relurl.path()),
            query=relurl.query(),
            fragment=relurl.fragment(),
        )
    elif relurl.query() or '?' in url:
        return URL(
            scheme=baseurl.scheme(),
            host=baseurl.host(),
            port=baseurl.port(),
            path=baseurl.path(),
            query=relurl.query(),
            fragment=relurl.fragment(),
        )
    elif relurl.fragment() or '#' in url:
        return URL(
            scheme=baseurl.scheme(),
            host=baseurl.host(),
            port=baseurl.port(),
            path=baseurl.path(),
            query=baseurl.query(),
            fragment=relurl.fragment(),
        )
    return baseurl

Usage:

>>> base = URL('http://user:[email protected]:8080/path/to/some/doc.html?q=query#frag')
...
>>> print(resolve(base, '../home'))
http://user:[email protected]:8080/path/to/home

>>> print(resolve(base, 'doc2.html'))
http://user:[email protected]:8080/path/to/some/doc2.html

>>> print(resolve(base, '?'))
http://user:[email protected]:8080/path/to/some/doc.html

>>> print(resolve(base, '?q=git'))
http://user:[email protected]:8080/path/to/some/doc.html?q=git

>>> print(resolve(base, '#'))
http://user:[email protected]:8080/path/to/some/doc.html?q=query

pix666 avatar Apr 28 '18 11:04 pix666

Am not sure about this one. I can see how the ../home request translates to a path change but values like ? already have an API in purl (setting the query string to an empty string).

Have you seen an API like this in another URL lib?

codeinthehole avatar Apr 29 '18 20:04 codeinthehole

Yes. url module in NodeJS: https://www.npmjs.com/package/url

resolve() is more than just urllib.parse.urljoin(). It resolves a target URL relative to a base URL in a manner similar to that of a Web browser resolving an anchor tag HREF.

Examples:

  1. <a href="../home">Go home</a> => change path + clear query + clear fragment.
  2. <a href="?">Page 1</a> => clear query + clear fragment.

Very useful for HTML parsing.

pix666 avatar Apr 30 '18 05:04 pix666

Interesting. I'll add a similar method.

codeinthehole avatar May 07 '18 20:05 codeinthehole