from_utf8_unchecked in BufferBackend can violate invariants on invalid index
This SAFETY comment (https://github.com/Robbepop/string-interner/blob/95574e27c0cda19113ab27dc1e8a5f9f9fa5ab8a/src/backend/buffer.rs#L102) seems to me to be reasoned incorrectly. It seems to be true for values of index that were returned by this BufferBackend, but when index is untrusted (could be provided arbitrarily) I think the invariant breaks.
I'll phrase this adversarially with an "attacker", although it also applies to ordinary bugs. The failure case is that index points into the middle (rather than the beginning) of an attacker-controlled string. The attacker can arrange for the bytes at this index to decode into a valid varlen, such that the decoded str_len is any attacker-controlled value. In particular, they can arrange for str_len to be longer than the string it is in the middle of, such that str_bytes contains some of the varlen bytes from the next string, which are not guaranteed to be utf8. This then breaks the invariants of from_utf8_unchecked.
Sadly, the only fix I can currently see is a bit-table on the side which indicates where the start of every string is. This would allow you to safely validate that index indeed points to the beginning of a string rather than the middle.
I guess another option would be to pick an encoding for lengths that always produces valid utf8. E.g. use a length encoding that never sets the top bit of any byte. Sadly, this wastes 1 bit out of every length byte. It's less waste than the lookaside bit-table of the previous suggestion.
@reinerp Generally you are right about the soundness of the API. However, given that this is an internal API that is only used properly internally this is at least not an issue right now that can be attacked. Please correct me if I am wrong.
However, it would be great if we could improve the situation here. Maybe just flagging it as unsafe API could be enough.
I dug deeper and it seems that the problem is not just internal to the BufferBackend since it unfortunately leaks outside via the resolve method which is bad.