fix: character issues with umlauts
link to issue: https://github.com/TibiaData/tibiadata-api-go/issues/470
- Wrap incoming HTML in charset.NewReader before goquery parsing
- Ensures ISO‑8859‑1 (and other legacy) input is normalized to UTF‑8
- Prevents “mojibake” (e.g. “ä” instead of “ä”)
- Updated TestWorldAntica to simulate Latin‑1 input and verify correct Umlaut decoding
- Added Antica.html for parsing character Näurin
Closes #470
Quality Gate passed
Issues
0 New issues
0 Accepted issues
Measures
0 Security Hotspots
0.0% Coverage on New Code
0.0% Duplication on New Code
@tobiasehlert I’ve updated the HTML collector to use charset.NewReader with the real Content-Type header instead of our custom converter, so incoming pages should now be normalized to proper UTF‑8 and preserve Umlauts (e.g. “Näurin”). I’m not super familiar with all the Go idioms here, so I’d really appreciate if someone could double check my changes.
List of some umlaut-characters:
- Näurin
- Hidofäs
- König der Toten
- Torbjörn
- Sir Pösi
- Wiliam Lundström
- Der Nachtjäger
- Stählerner Krieger
- Nöber of Guards
- Skalle pär
- Höfix
- Bürgy
- Wächter der Hölle
- Gordon Dödsmetal
- Nöber
Thanks for your PR @Skyliife, but I've created #506 to only adress the umlaut issue itself.
Any particular reason why we should switch to charset.NewReader?
I see maybe the benefit in using the Content-Type header, but maybe I miss something else.
@Skyliife, didn't notice that the encoding from tibia.com is utf-8 now.. so should have given you credits in #511.