PyDomainExtractor
PyDomainExtractor copied to clipboard
extract_from_url should handle url without protocol-scheme
How to reproduce:
call extract_from_url with //mail.google.com/mail as input
result will be Invalid Domain Error
expected behavior is to handle the case of missing protocol and return {subdomain: mail, domain: google, suffix: com}
Technically this is not a valid URI but a URI reference. https://en.wikipedia.org/wiki/Uniform_Resource_Identifier#URI_references
We can support it, but not without a // at the beginning to distinguish between a valid and invalid URIs
Tldextract can extract with schemes

It appears that extract_from_url("//mail.google.com/mail") now works.