PyDomainExtractor
PyDomainExtractor copied to clipboard
A blazingly fast domain extraction library written in Rust
Tldextract extracts ips and http schemes with url while this extractor can't. The speed doesn't matter in this case. What matters is the correctness of the data scraped.
How to reproduce: call ```extract_from_url``` with ```//mail.google.com/mail``` as input result will be Invalid Domain Error expected behavior is to handle the case of missing protocol and return ```{subdomain: mail, domain:...
How to reproduce: call ```extract_from_url``` with http://127.0.0.1 as input. result will be ```{subdomain: 127.0.0, domain: 1}``` expected behavior: throw Invalid Domain Error
Good morning, You could enhance the dictionary structure by providing 3 more elements: - f**qdn:** Which is a mapping of subdomain.domain.suffix - **registered domain:** which is a mapping of domain.suffix...
Hello, Would it be possible to add a Python 3.12 wheel, so that the lib could be easily installed in a Python 3.12 venv?
From the README > The test was conducted on a file containing 1 million random urls (Mar. 13rd 2022) It would be good to add a hyperlink to the specific...
Updating cortex descriptor