Sam Hardwick
Sam Hardwick
Is the only additional issue besides endianness datatype size? If so, I would be prepared to write a portable mode for the optimized-lookup conversion process and ospell reading.
I have a somewhat simpler proposal, at least for ospell's needs. The optimized-lookup format doesn't need signed integers, and I think we could quite simply detect endianness and flip byte...
In effect I mean a function like bool is_big_endian(void) { union { uint32_t i; int8_t c[4]; } to_check = {0x01000000}; return to_check.c[0] == 1; } and writing/reading functions that are...
Also: I think it would be reasonable to go always-weighted.
The nice thing about using 32 bits for bools is that it preserves compatibility with existing binary transducers. There are 9 bools in the header, so it's only a waste...
But do you still think we need a new 4.0 format rather than just interoperate with 3.0? After all, optimized-lookup is the only binary format under our own control..
Okay. When you say "the header", are you talking about the HFST3 header, the optimized-lookup header or both? Are you planning on getting rid of the separate optimized-lookup header?
Okay. And for the optimized-lookup header, is the only change you're planning to make bools 8-bit instead of 32-bit? Perhaps we should encode them all in two bytes if we're...
I'm just still not completely sold on the need for varints + zigzag. The only thing in the HFST3 header with binary data is the header length field, and that's...