irc-framework icon indicating copy to clipboard operation
irc-framework copied to clipboard

Encoding autodetection

Open prawnsalad opened this issue 10 years ago • 5 comments

The current node-irc currently does this using the same iconv-lite lib

prawnsalad avatar Dec 28 '15 12:12 prawnsalad

Any plans for this?

I'm a bit sceptic how well auto-detection for arbitrary encoding would work in IRC (very short inputs) but I'd like to see xchat/irssi style input detection for UTF-8. Ie. if input is invalid UTF-8, then it tries one other predefined encoding.

node-irc is not using iconv-lite. It uses iconv and node-icu-charset-detector. Difference being those are native addons and you have to install libicu manually (iconv seems to be bundled).

It seems most of encoding detection modules are based on icu. There are some js-only modules too, for example node-chardet.

If going for UTF-8 detection only, then there is utf-8-validate. It's native addon but at least it doesn't have external dependencies.

I guess, I'll toss a coin and test one of these options soon (UTF-8 or auto detect).

apihlaja avatar Aug 17 '16 17:08 apihlaja

utf-8-validate also recently got a fallback JS-only implementation in case the native implementation fails, here. I guess this function accomplishes mostly the same as the native implementation?

Maybe something like this could be used, and if the buffer is not valid UTF-8, decode using a configured fallback encoding instead?

FruitieX avatar Dec 23 '16 18:12 FruitieX

Input detection for utf8 & fallback looks rather trivial but it's not on the roadmap.

@kiwiirc Would you accept PR?

apihlaja avatar Mar 02 '17 17:03 apihlaja

@apihlaja definitely! I'm not an encoding pro and I just haven't got around to looking further into it as yet so this would be very helpful.

prawnsalad avatar Mar 02 '17 21:03 prawnsalad

So, this was the main thing that bugged me after switching to The Lounge, and I happened to have some spare hacking time over Easter. I tried both autodetection and the irssi-style fallback, and the latter turned out by far the less disastrous alternative.

I would have preferred to have an option in iconv-lite to throw when decoding fails, but implementation through utf-8-validate was simple and functional enough. Feedback appreciated, as this is literally the first time I've done anything with Node. :)

lorkki avatar Apr 01 '18 17:04 lorkki