libwally-core Add support for abbreviated bip39 mnemonic elements

Here's a simple set of changes that allow for the support of 4 (and greater) character bip39 mnemonic elements. It does this by implementing a uft8_strncmp() function that tries to account for utf8 multibyte characters (counting characters encoded in multiple bytes as a single character). Some characters are actually composed of two characters - for example in Hiragana (Japanese) sounds mark characters can be added to a preceding character - in which case only a single character should be counted. In any case, the utf8_strncmp() function should work on the character sets supported by libwally-core as of this PR but might need to be updated of other character sets are added.

One caveat with these changes is that it assumes that wordlist is only intended to handle 4 significant character words. If it's intended to be more general then the number of significant characters should be parameterized.

May resolve #89

Feb 28 '19 18:02 mword

Rather than changing the current code, I think it would be best if all we did was adding a new function that, provided a 4 characters words, returns the matching full word - then users of wally can easily convert from the short form to the full form.

Apr 22 '19 13:04 greenaddress

@greenaddress I can see a were a tool to convert short-form to full-form mnemonic strings might be useful but the two step process of 1. converting short-form to full-form string and then 2. doing full word string compares on the full words seems unnecessarily inefficient (as opposed to just 1. doing 4 character compares on any-form mnemonic. Maybe I'm missing something?

Also, with the changes I made, to get a full-form mnemonic string from a short-form string one need only do a mnemonic_to_bytes followed by a mnemonic_from_bytes. In fact, along with cloning wordlist_lookup_word (as wordlist_lookup_abbreviated_word?) I'd probably want to do just that to do the conversion.

Your call.

Apr 23 '19 22:04 mword

@greenaddress I have a counterproposal -- can we add a "unique_prefix" flag to wordlist_lookup_word, default false, which enables the following behavior: if there is exactly one word in the wordlist with "word" as a prefix, return that word, otherwise fail. (Alternatively this could be a new wordlist_lookup_word_unique_prefix function.) Then add the same flag to mnemonic_to_bytes and pass it through.

This doesn't require any special handling for unicode, doesn't require hardcoding the four-character prefix rule for bip39 wordlists (but works with any correct use of four-letter truncated mnemonics), and generally does exactly what we want with minimal changes.

May 15 '19 20:05 gwillen