reserve doesn't prevent node/flat hash set/map from rehasing
Describe the bug
Documentation states that reserve(size_t) allocates enough buckets to prevent a container from rehashing
// Sets the number of slots in the `flat_hash_set` to the number needed to
// accommodate at least `count` total elements without exceeding the current
// maximum load factor, and may rehash the container if needed.
However rehasing still might happen. Consider the following example:
{
const size_t N = 50;
absl::flat_hash_set<uint32_t> set(N);
std::vector<decltype(set)::iterator> itrs;
itrs.reserve(N);
for (size_t i = 0; i < N; ++i) {
auto res = set.emplace(i);
itrs.emplace_back(res.first);
}
std::cerr << "Initial load factor:" << set.load_factor() << '\n';
for (size_t i = N+1; i < 1000000000000; ++i) {
auto it = itrs.back();
itrs.pop_back();
set.erase(it); // <-- remove an element first to fit reserved bucket count
auto res = set.emplace(i); // <--- might trigger rehashing
itrs.push_back(res.first);
std::cerr << "Load factor: " << set.load_factor() << '\n'; // <-- doesn't grow
size_t sum = 0;
for (auto it : itrs) {
sum += (*it); // triggers AssertIsFull
}
}
What version of Abseil are you using?
checked against master HEAD and Abseil LTS 20200923, Patch 3
What operating system and version are you using Ubuntu 20.04 LTS
What compiler and version are you using?
COLLECT_GCC=gcc
COLLECT_LTO_WRAPPER=/usr/lib/gcc/x86_64-linux-gnu/9/lto-wrapper
OFFLOAD_TARGET_NAMES=nvptx-none:hsa
OFFLOAD_TARGET_DEFAULT=1
Target: x86_64-linux-gnu
Configured with: ../src/configure -v --with-pkgversion='Ubuntu 9.3.0-17ubuntu1~20.04' --with-bugurl=file:///usr/share/doc/gcc-9/README.Bugs --enable-languages=c,ada,c++,go,brig,d,fortran,objc,obj-c++,gm2 --prefix=/usr --with-gcc-major-version-only --program-suffix=-9 --program-prefix=x86_64-linux-gnu- --enable-shared --enable-linker-build-id --libexecdir=/usr/lib --without-included-gettext --enable-threads=posix --libdir=/usr/lib --enable-nls --enable-clocale=gnu --enable-libstdcxx-debug --enable-libstdcxx-time=yes --with-default-libstdcxx-abi=new --enable-gnu-unique-object --disable-vtable-verify --enable-plugin --enable-default-pie --with-system-zlib --with-target-system-zlib=auto --enable-objc-gc=auto --enable-multiarch --disable-werror --with-arch-32=i686 --with-abi=m64 --with-multilib-list=m32,m64,mx32 --enable-multilib --with-tune=generic --enable-offload-targets=nvptx-none=/build/gcc-9-HskZEa/gcc-9-9.3.0/debian/tmp-nvptx/usr,hsa --without-cuda-driver --enable-checking=release --build=x86_64-linux-gnu --host=x86_64-linux-gnu --target=x86_64-linux-gnu
Thread model: posix
gcc version 9.3.0 (Ubuntu 9.3.0-17ubuntu1~20.04)
What build system are you using?
cmake version 3.16.3
Because of probing requirements, erased slots may still occupy capacity1 from the perspective of needing a rehash. Depending on overall fullness of the table, we will do either a growing rehash or a compacting rehash in that case.2
The behavior is correct, although the documentation needs improvement to be more clear.
@fowles got it thanks, is there a way to prevent rehashing?
In the presence of mixed inserts and erases, no. Logically, you get the following properties to work with:
- reserve(N) will guarantee that you will not get a rehash for the next N calls to insert
- erase may provide you with an additional insert before rehash (but it may not) and the exact rule for when depends on the hash of all the elements currently in the table
So if you have an upper bound on the number of inserts calls, you can avoid a rehash. Anything else is playing with fire.