url-normalize
url-normalize copied to clipboard
Twitter hashtag search breaks on normalization
Here's a sample Twitter search with a hashtag: https://twitter.com/search?q=%23cncmachining&src=typed_query
When I run it through url_normalization, the encoded hash character (%23) is decoded into a hash (#), but it should stay encoded, because when I visit the normalized url, it 404s.
>>> from url_normalize import url_normalize
>>> url_normalize("https://twitter.com/search?q=%23cncmachining&src=typed_query")
'https://twitter.com/search?q=#cncmachining&src=typed_query'
When you visit them in the browser: