abseil-cpp icon indicating copy to clipboard operation
abseil-cpp copied to clipboard

reserve doesn't prevent node/flat hash set/map from rehasing

Open gnusi opened this issue 4 years ago • 3 comments

Describe the bug

Documentation states that reserve(size_t) allocates enough buckets to prevent a container from rehashing

  // Sets the number of slots in the `flat_hash_set` to the number needed to
  // accommodate at least `count` total elements without exceeding the current
  // maximum load factor, and may rehash the container if needed.

However rehasing still might happen. Consider the following example:

 {
    const size_t N = 50;
    absl::flat_hash_set<uint32_t> set(N);
    std::vector<decltype(set)::iterator> itrs;
    itrs.reserve(N);

    for (size_t i = 0; i < N; ++i) {
      auto res = set.emplace(i);
      itrs.emplace_back(res.first);
    }

    std::cerr << "Initial load factor:" << set.load_factor() << '\n';

    for (size_t i = N+1; i < 1000000000000; ++i) {
      auto it = itrs.back();
      itrs.pop_back();
      set.erase(it); // <-- remove an element first to fit reserved bucket count
      auto res = set.emplace(i); // <--- might trigger rehashing
      itrs.push_back(res.first);

      std::cerr << "Load factor: " << set.load_factor() << '\n'; // <-- doesn't grow

      size_t sum = 0;
      for (auto it : itrs) {
        sum += (*it); // triggers AssertIsFull
      }
    }

What version of Abseil are you using? checked against master HEAD and Abseil LTS 20200923, Patch 3

What operating system and version are you using Ubuntu 20.04 LTS

What compiler and version are you using?

COLLECT_GCC=gcc
COLLECT_LTO_WRAPPER=/usr/lib/gcc/x86_64-linux-gnu/9/lto-wrapper
OFFLOAD_TARGET_NAMES=nvptx-none:hsa
OFFLOAD_TARGET_DEFAULT=1
Target: x86_64-linux-gnu
Configured with: ../src/configure -v --with-pkgversion='Ubuntu 9.3.0-17ubuntu1~20.04' --with-bugurl=file:///usr/share/doc/gcc-9/README.Bugs --enable-languages=c,ada,c++,go,brig,d,fortran,objc,obj-c++,gm2 --prefix=/usr --with-gcc-major-version-only --program-suffix=-9 --program-prefix=x86_64-linux-gnu- --enable-shared --enable-linker-build-id --libexecdir=/usr/lib --without-included-gettext --enable-threads=posix --libdir=/usr/lib --enable-nls --enable-clocale=gnu --enable-libstdcxx-debug --enable-libstdcxx-time=yes --with-default-libstdcxx-abi=new --enable-gnu-unique-object --disable-vtable-verify --enable-plugin --enable-default-pie --with-system-zlib --with-target-system-zlib=auto --enable-objc-gc=auto --enable-multiarch --disable-werror --with-arch-32=i686 --with-abi=m64 --with-multilib-list=m32,m64,mx32 --enable-multilib --with-tune=generic --enable-offload-targets=nvptx-none=/build/gcc-9-HskZEa/gcc-9-9.3.0/debian/tmp-nvptx/usr,hsa --without-cuda-driver --enable-checking=release --build=x86_64-linux-gnu --host=x86_64-linux-gnu --target=x86_64-linux-gnu
Thread model: posix
gcc version 9.3.0 (Ubuntu 9.3.0-17ubuntu1~20.04) 

What build system are you using?

cmake version 3.16.3

gnusi avatar Feb 25 '21 19:02 gnusi

Because of probing requirements, erased slots may still occupy capacity1 from the perspective of needing a rehash. Depending on overall fullness of the table, we will do either a growing rehash or a compacting rehash in that case.2

The behavior is correct, although the documentation needs improvement to be more clear.

fowles avatar Feb 25 '21 22:02 fowles

@fowles got it thanks, is there a way to prevent rehashing?

gnusi avatar Feb 25 '21 22:02 gnusi

In the presence of mixed inserts and erases, no. Logically, you get the following properties to work with:

  1. reserve(N) will guarantee that you will not get a rehash for the next N calls to insert
  2. erase may provide you with an additional insert before rehash (but it may not) and the exact rule for when depends on the hash of all the elements currently in the table

So if you have an upper bound on the number of inserts calls, you can avoid a rehash. Anything else is playing with fire.

fowles avatar Feb 26 '21 05:02 fowles