Twitter hashtag search breaks on normalization

Open samuelclay opened this issue 3 years ago • 0 comments

Here's a sample Twitter search with a hashtag: https://twitter.com/search?q=%23cncmachining&src=typed_query

When I run it through url_normalization, the encoded hash character (%23) is decoded into a hash (#), but it should stay encoded, because when I visit the normalized url, it 404s.

>>> from url_normalize import url_normalize
>>> url_normalize("https://twitter.com/search?q=%23cncmachining&src=typed_query")
'https://twitter.com/search?q=#cncmachining&src=typed_query'

When you visit them in the browser:

https://twitter.com/search?q=#cncmachining&src=typed_query doesn't work
https://twitter.com/search?q=%23cncmachining&src=typed_query does work

Jun 21 '22 20:06 samuelclay