tldextract icon indicating copy to clipboard operation
tldextract copied to clipboard

Accurately separates a URL’s subdomain, domain, and public suffix, using the Public Suffix List (PSL).

Results 37 tldextract issues
Sort by recently updated
recently updated
newest added

`PUBLIC_SUFFIX_LIST_URLS` now can only define with the function arguments, can this define by environment?

For URLs using IPv4 addresses, the host address gets extracted correctly using `.domain` but `.fqdn` gives the empty string: ```python >>> tldextract.extract("https://127.0.0.1:1234/foobar").domain '127.0.0.1' >>> tldextract.extract("https://127.0.0.1:1234/foobar").fqdn '' ``` For URLs using...

I recently used this library to extract the second-level domain names from the 1.2 billion PTR records in Rapid7's [Sonar](https://opendata.rapid7.com/sonar.rdns_v2/) database. As an example it would extract ``totinternet`` from ``node-nnf.pool-1-1.dynamic.totinternet.net``....

I start getting this error when I increase the number of processes / threads to a certain point. Is there a way to increase the timeout value? More importantly, why...

in `tldextract/remote.py` it should allow passing in a custom instance of `Session` from `requests`. need that change and then changes up the call-stack so that this custom session can be...

help wanted
good first issue
see issue #158 [one fell swoop]

While it's understandable and useful in many situations to want the latest dataset, it can cause issues in some situations: - ephemeral environments that will not be able to cache...

For reading `.tld_set`, `__file__` should be replaced with pkgutil.get_data, which is used for `.tld_set_snapshot`. The dataset file may have been placed into the package data via another means, and the...

Reconsider caching in the library's install folder. The GitHub issue tracker is rife with confusion about the permission warning (#9), or outright uncaught exceptions (#209). Finally do something about it....

How can we exclude a tld from the PSLs loaded by the library, is there a setting for that?

Problem: We have an application, which perform TLD operation using celery workers. As we have a couple of celery workers, whenever cache_file update is called, it only updates the file...