NetUnicodeInfo icon indicating copy to clipboard operation
NetUnicodeInfo copied to clipboard

Feature request: Searching by name

Open themobydisk opened this issue 10 years ago • 8 comments

I want to make an app that searches unicode characters by name. It looks like the UnicodeInfo class only lets me search by character. Ex: If I want to find a Unicode music note, I want to call something like:

// Returns 2669 - 266C, 1D13B - 1D164, 1F3B5, etc.
IEnumerable<int> matchingCharacters = UnicodeInfo.FindByName("note"); 

themobydisk avatar Jan 07 '16 19:01 themobydisk

Hello,

What you request is a full text index on name data. Providing a full text search algorithm is well outside the scope of this library, but there are tools which can provide this feature, like sqlite. If you wish, you can already build such an index yourself by requesting the name for every possible code point (0x0000 to 0x10FFFF).

I may implement a few helper methods to help with scenarios like this one, allowing to enumerate things like valid code points or known names, but that would be it.

hexawyz avatar Jan 09 '16 15:01 hexawyz

Okay, fair enough. Probably not a good fit for this project. Thanks for the reply!

themobydisk avatar Jan 10 '16 16:01 themobydisk

Actually you'll be surprised how simple it is, even without a full text search engine. Here's the sample code that works for me:

        private Dictionary<int, string> descriptions = new Dictionary<int, string>();

        private void BuildUnicodeList()
        {
            var blocks = UnicodeInfo.GetBlocks();

            foreach (var block in blocks)
            {
                foreach (var codepoint in block.CodePointRange)
                {
                    if (char.IsSurrogate((char)codepoint))
                    {
                        continue;
                    }

                    var charInfo = UnicodeInfo.GetCharInfo(codepoint);
                    var displayText = charInfo.Name;
                    if (displayText != null)
                    {
                        descriptions[codepoint] = displayText;
                    }
                }
            }
        }

...
            var sb = new StringBuilder();
            int hitcount = 0;
            foreach (var d in descriptions)
            {
                if (hitcount > 20)
                {
                    return sb.ToString();
                }

                if (d.Value.IndexOf(input, StringComparison.OrdinalIgnoreCase) > -1)
                {
                    sb.AppendLine(d.Key);
                    hitcount++;
                }
            }

            if (sb.Length > 0)
            {
                return sb.ToString();
            }

KirillOsenkov avatar Sep 05 '17 20:09 KirillOsenkov

The performance on my machine is about 70-80 ms per lookup, so of course having an in-memory trie or other index can significantly speed it up, however if you're OK with these numbers then it works great and is super simple.

KirillOsenkov avatar Sep 05 '17 20:09 KirillOsenkov

Nice solution with so little code. 👍 It would likely be enough for most scenarios.

I did write some code that you can use to create an index of Unicode characters, but it's not production ready. (@themobydisk, I apologize to you. I had totally forgotten about that issue… :( )

You can try it and/or benchmark it if you want: https://gist.github.com/GoldenCrystal/0071772cd111ac4b45b21470f1ac101f It needs a bit of cleaning, but as far as I remember, the code was working. Once cleaned a bit, I will include it as an example, instead of letting it rot on my hard drive…

hexawyz avatar Sep 06 '17 23:09 hexawyz

BTW I'm using my naive lookup algorithm here: http://quickinfo.io/?char%20cherries

You can try searching for various emoji names, paste emoji to view their info, etc.

KirillOsenkov avatar Sep 07 '17 22:09 KirillOsenkov

FYI I've now implemented fast indexed lookup of unicode characters here: https://github.com/KirillOsenkov/QuickInfo/blob/a1b9880c0beeaaa1472d14d78e2799399795c657/src/QuickInfo/Processors/Unicode.cs#L103-L113

It requires creating an index like this: https://github.com/KirillOsenkov/QuickInfo/blob/a1b9880c0beeaaa1472d14d78e2799399795c657/src/QuickInfo/Processors/Unicode.cs#L178

The helper code is here: https://github.com/KirillOsenkov/QuickInfo/blob/master/src/QuickInfo/Utilities/SortedSearch.cs

Hope this helps!

KirillOsenkov avatar Mar 12 '18 05:03 KirillOsenkov

I really just want and easy Name -> Char lookup. No need for search.

AndreasVolkmann avatar Jul 10 '24 01:07 AndreasVolkmann