tldextract
tldextract copied to clipboard
Incorrect empty FQDN when no subdomain and domain
When the extract result's subdomain and domain are both empty, it will return an incorrect empty FQDN.
Python 3.6.7 (default, Oct 22 2018, 11:32:17)
[GCC 8.2.0] on linux
Type "help", "copyright", "credits" or "license" for more information.
>>> import tldextract
>>> tldextract.__version__
'2.2.0'
>>> tldextract.extract('http://s3.amazonaws.com/')
ExtractResult(subdomain='', domain='', suffix='s3.amazonaws.com')
>>> tldextract.extract('http://s3.amazonaws.com/').fqdn
''
>>> tldextract.extract('http://blogspot.com/')
ExtractResult(subdomain='', domain='', suffix='blogspot.com')
>>> tldextract.extract('http://blogspot.com/').fqdn
''
#174 would be a related issue but it's not the same.
Is this check really necessary? https://github.com/john-kurkowski/tldextract/blob/d6574145f76d916ce978eeb898c9e022cf31b87f/tldextract/tldextract.py#L121-L124
Ugh, that is unintuitive. Private domains strike again. The check was originally for e.g. tldextract.extract('localhost'), where there is no FQDN to reconstruct. It didn't consider private domains.
Like this comment in #138, if we add a way for each suffix to know whether it is private or not, we could change the check to something like this.
if self.suffix and (self.domain or self.is_private):