tldextract icon indicating copy to clipboard operation
tldextract copied to clipboard

Incorrect empty FQDN when no subdomain and domain

Open BookGin opened this issue 6 years ago • 1 comments

When the extract result's subdomain and domain are both empty, it will return an incorrect empty FQDN.

Python 3.6.7 (default, Oct 22 2018, 11:32:17) 
[GCC 8.2.0] on linux
Type "help", "copyright", "credits" or "license" for more information.
>>> import tldextract
>>> tldextract.__version__
'2.2.0'
>>> tldextract.extract('http://s3.amazonaws.com/')
ExtractResult(subdomain='', domain='', suffix='s3.amazonaws.com')
>>> tldextract.extract('http://s3.amazonaws.com/').fqdn
''
>>> tldextract.extract('http://blogspot.com/')
ExtractResult(subdomain='', domain='', suffix='blogspot.com')
>>> tldextract.extract('http://blogspot.com/').fqdn
''

#174 would be a related issue but it's not the same.

Is this check really necessary? https://github.com/john-kurkowski/tldextract/blob/d6574145f76d916ce978eeb898c9e022cf31b87f/tldextract/tldextract.py#L121-L124

BookGin avatar Oct 27 '19 14:10 BookGin

Ugh, that is unintuitive. Private domains strike again. The check was originally for e.g. tldextract.extract('localhost'), where there is no FQDN to reconstruct. It didn't consider private domains.

Like this comment in #138, if we add a way for each suffix to know whether it is private or not, we could change the check to something like this.

if self.suffix and (self.domain or self.is_private):

john-kurkowski avatar Nov 27 '19 19:11 john-kurkowski