trie
trie copied to clipboard
Update to latest libdatrie, move libdatrie to submodule, Unicode support
Hi @tyler ,
I made some updates to this gem that hopefully you could consider incorporating.
- The libdatrie source files baked into the fast_trie gem are quite old, version 0.1.99. This PR updates to the latest libdatrie at https://github.com/tlwg/libdatrie, version 0.2.13.
- libdatrie is licensed under LGPL-2.1, but this gem is licensed under MIT. To clear up any licensing confusion, added LGPL-2.1 to the gemspec and removed all libdatrie source files. They are now pulled in via git submodule. Renamed the
trie.cwith the Ruby extension code totrie_ext.c - libdatrie now requires an
AlphaMapspecifying the alphabet for the trie. Created a newAlphaMapclass in Ruby to represent this, and anAlphaMapcan be passed in toTrie.new. If not specified, a defaultAlphaMapis used with range 0x00-0xff. - libdatrie now supports Unicode. Added support for Unicode. Strings are converted to UTF32 to map to libdatrie's
AlphaChartype, and converted to utf8 on the way back.
Changes to Trie Ruby bindings:
- Updated to use the
TypedData_Wrap_Structmacros instead ofData_Wrap_Struct, etc. - Support for
Marshal.load,Marshal.dump - There is now an iterator interface to libdatrie which probably didn't exist when this gem was written. Rewrote the
walkandchildren*methods to use thetrie_iteratorinterface. This got rid of achar prefix[1024]fixed-size buffer that was a potential buffer overflow. - The gem supported passing any Ruby
VALUEas the weight for a key. This has become problematic becauseTrieDatain libdatrie is declared asuint32_t, but on modern 64-bit systems,VALUEis 64 bits because it's the size of a pointer. I fixed this by removing the ability to specify any type for the weight but fixnum. Having anyVALUEin there was also problematic for reading/writing to a file, since it would write memory pointers. - Added
add?method to matchtrie_store_if_absent - Added
concatmethod like https://github.com/gonzedge/rambling-trie to add a whole array at once, with added support for weights - Updated specs with Unicode test cases and tests for the new methods.
Thanks for writing this gem way back when!