Bug: Last elements in `basic_charset` initialization are discarded
Describe the bug
When initializing a basic_charset with character arrays such as those returned by digits() and others, the last element of the array is incorrectly discarded.
https://github.com/ashvardanian/StringZilla/blob/152ed046701aceae2c3d4c995b0b661a9aa06d95/include/stringzilla/stringzilla.hpp#L192-L195
This is due to the use of count_characters - 1 in the loop that populates the basic_charset bitset.
https://github.com/ashvardanian/StringZilla/blob/152ed046701aceae2c3d4c995b0b661a9aa06d95/include/stringzilla/stringzilla.hpp#L283-L289
While this prevents including a null terminator for string literals like sz::char_set("x"), it causes incorrect behavior when handling character arrays that do not have a null terminator, resulting in the exclusion of the final character.
I believe this local block is the full extent of the affected code.
https://github.com/ashvardanian/StringZilla/blob/152ed046701aceae2c3d4c995b0b661a9aa06d95/include/stringzilla/stringzilla.hpp#L330-L341
Steps to reproduce
#include "stringzilla/stringzilla.hpp"
namespace sz = ashvardanian::stringzilla;
int main() {
sz::string haystack = "239";
// Test with null-terminated string
assert(haystack.contains_only(sz::char_set("0123456789")));
// Passes: null terminator is correctly discarded
// Test with initializer list
static std::initializer_list all = {'0', '1', '2', '3', '4', '5', '6', '7', '8', '9'};
assert (haystack.contains_only(sz::char_set {all}));
// Passes: constructor for initializer list is called
// Test with carray
sz::carray<10> all_digits = {'0', '1', '2', '3', '4', '5', '6', '7', '8', '9'};
assert(haystack.contains_only(sz::char_set {all_digits}));
assert(haystack.is_digit());
// Fails: '9' is incorrectly discarded
}
Expected behavior
No asserts
StringZilla version
v3.11.0
Operating System
Ubuntu 22.04.5
Hardware architecture
x86
Which interface are you using?
C++ bindings
Contact Details
No response
Are you open to being tagged as a contributor?
- [X] I am open to being mentioned in the project
.githistory as a contributor
Is there an existing issue for this?
- [X] I have searched the existing issues
Code of Conduct
- [X] I agree to follow this project's Code of Conduct
I will be able to write fix and add tests tomorrow
Proposed Fix
-
Using
carraywith Null Terminator
inline carray<11> const &digits() noexcept {
static carray<11> const all = {'0', '1', '2', '3', '4', '5', '6', '7', '8', '9', '\0'};
return all;
}
-
Using
std::initializer_list<char>
inline auto const &digits() noexcept {
static std::initializer_list all = {'0', '1', '2', '3', '4', '5', '6', '7', '8', '9'};
return all;
}
Then the constructor for the initializer list will be called here: https://github.com/ashvardanian/StringZilla/blob/152ed046701aceae2c3d4c995b0b661a9aa06d95/include/stringzilla/stringzilla.hpp#L335-L335
Thank you @alexbarev! Let's start with tests, and later merge a patch.