svector icon indicating copy to clipboard operation
svector copied to clipboard

Add emplace_back_unchecked

Open Andersama opened this issue 2 years ago • 6 comments

I don't expect this to be accepted, however I actually wrote a short paper on this a few years ago. I tend to add this utility and another for modifying the size to my own custom vectors to be able to get every last bit of performance where I can. You can get an easy 100% performance swing with this over emplace_back.

Even better is actually this pattern:

std::vector<char> a_pseudo_string;
a_pseudo_string.reserve(32);

char *danger_ptr = a_pseudo_string.data();
for (size_t i = 0; i < 32; i++) {
  danger_ptr[i] = i+32;
}
// an incredibly awkward use of assign
a_psudo_string.assign(danger_ptr, danger_ptr+32);

Where assign also can get an unchecked variation. Since the internals of svector already has

    template <direction D>
    void set_size(size_t s) {
        if constexpr (D == direction::direct) {
            set_direct_and_size(s);
        } else {
            indirect()->size(s);
        }
    }

The second is already possible*.

Alternatively could be named unchecked_emplace_back.

Andersama avatar Apr 16 '23 17:04 Andersama

I think this is what David Stone proposes ("append_from_capacity") in his CppCon talk

Philippe91 avatar Apr 16 '23 17:04 Philippe91

I googled a bit, I think an API like here in P1010R1 sounds good to me:

  • T* uninitialized_data() - Returns a pointer to storage that would back elements [size(), capacity()).
  • insert_from_capacity(size_type n) Appends n elements from capacity. The application must have initialized the storage backing these elements otherwise the behavior is undefined. (or name it append_from_capacity)

martinus avatar Apr 16 '23 18:04 martinus

It's not a new idea, I was about to submit my paper to the standards committee but someone mentioned that a similar idea was rejected years beforehand. I'm not surprised though that it keeps getting brought up, it's a pretty easy performance improvement when you know you can use it.

insert_from_capacity's a strange name since insert's used to place data essentially "anywhere". I tend to call mine append_unchecked or unchecked_append since for whatever reason append is not defined for vector, but is for strings.

Here's some old nanobench results from my draft from 2 years ago.

| MSVC x64 debug
|               ns/op |                op/s |    err% |     total | emplace_back benchmark
|--------------------:|--------------------:|--------:|----------:|:----------
|               91.60 |       10,917,030.57 |    0.5% |      0.00 | `emplace_back`
|               31.82 |       31,421,838.18 |    0.4% |      0.00 | `unchecked_emplace_back`
|                9.04 |      110,663,983.90 |    3.4% |      0.00 | `assign`
|                8.65 |      115,606,936.42 |    0.1% |      0.00 | `self_assign`

| MSVC x64 release
|               ns/op |                op/s |    err% |     total | emplace_back benchmark
|--------------------:|--------------------:|--------:|----------:|:----------
|                0.97 |    1,026,392,961.88 |    0.1% |      0.00 | `emplace_back`
|                0.89 |    1,126,637,554.59 |    0.1% |      0.00 | `unchecked_emplace_back`
|                0.57 |    1,741,245,136.19 |    0.1% |      0.00 | `assign`
|                0.53 |    1,899,335,232.67 |    0.0% |      0.00 | `self_assign`

| Clang x64 debug
|               ns/op |                op/s |    err% |     total | emplace_back benchmark
|--------------------:|--------------------:|--------:|----------:|:----------
|               21.06 |       47,483,380.82 |    0.1% |      0.00 | `emplace_back`
|               20.40 |       49,019,607.84 |    0.1% |      0.00 | `unchecked_emplace_back`
|                2.78 |      359,640,359.64 |    0.1% |      0.00 | `assign`
|                2.71 |      369,290,573.37 |    0.0% |      0.00 | `self_assign`

| Clang x64 release
|               ns/op |                op/s |    err% |     total | emplace_back benchmark
|--------------------:|--------------------:|--------:|----------:|:----------
|                1.77 |      564,440,263.41 |    0.1% |      0.00 | `emplace_back`
|                0.80 |    1,256,544,502.62 |    0.2% |      0.00 | `unchecked_emplace_back`
|                0.55 |    1,803,127,874.89 |    0.0% |      0.00 | `assign`
|                0.53 |    1,873,767,258.38 |    0.0% |      0.00 | `self_assign`

It somewhat depends on the complexity of the type of course, but as you can see using assign's WAY faster. I think the strange result was actually that MSVC seemingly out optimized clang at the time for regular emplace_back. I can't remember why I think it was that msvc managed to optimize with SIMD instructions, when the conditional check was removed from both I think the optimizers got a fairly decent shot optimizing the append.

Andersama avatar Apr 17 '23 10:04 Andersama

Nice nanobench benchmarks @Andersama :smile: I also don't like insert_from_capacity. Digging more into standardization, the last status of P1010 is that it should be revised, based on the accepted resize_and_overwrite: https://github.com/cplusplus/papers/issues/322

This will be in C++23 for std::string, and It would probably be best to have the same API for vector: https://en.cppreference.com/w/cpp/string/basic_string/resize_and_overwrite

martinus avatar Apr 18 '23 04:04 martinus

Taking a look at the cppreference example

food.resize_and_overwrite(
        resize_to,
        [food_size = food.size()](char* p, [std::size_t](https://en.cppreference.com/w/cpp/types/size_t) n) noexcept -> [std::size_t](https://en.cppreference.com/w/cpp/types/size_t) {
            // p[0]..p[n] is the assignable range
            // p[0]..p[min(n, food_size) - 1] is the readable range
            // (contents initially equal to the original string)
 
            // Debug print:
            [std::cout](https://en.cppreference.com/w/cpp/io/cout) << "In Operation(); n: " << n << '\n';
 
            // Copy fruits to the buffer p while there is enough space.
            char* first = p + food_size;
 
            for (char* const end = p + n; const [std::string_view](https://en.cppreference.com/w/cpp/string/basic_string_view) fruit : fruits) {
                char* last = first + fruit.size() + 1;
                if (last > end)
                    break;
                *first++ = ' ';
                std::[ranges::copy](https://en.cppreference.com/w/cpp/algorithm/ranges/copy)(fruit, first);
                first = last;
            }
 
            const auto final_size { static_cast<[std::size_t](https://en.cppreference.com/w/cpp/types/size_t)>(first - p) };
 
            // Debug print:
            [std::cout](https://en.cppreference.com/w/cpp/io/cout) << "In Operation(); final_size: " << final_size << '\n';
 
            [assert](https://en.cppreference.com/w/cpp/error/assert)(final_size <= n);
            return final_size; // Return value is the actual new length
                               // of the string, must be in range 0..n
        });

I have the distinct feeling the signature is confusing. If I were to write the callback signature I'd include the old size in the parameters, or alternatively pass the original vector into the callback. It looks like the current signature is passing the entire range via the pointer and the new capacity essentially forcing the user to create a lambda to capture the size.

I'd consider for example:

food.lazy_append(new_capacity, [](char *ptr, size_t old_size, size_t new_capacity){
// now ptr[0..old_size-1] is still available but we don't have to be capturing it ourselves
// bonus we can use c functions as a callback meaning this'd be backwards compatible with older c++ standards and could be done without a template
});

//bonus, the implementation's probably the simplest thing I can think of
void lazy_append(size_t new_capacity, void(*callback)(value_type *, size_t, size_t)) {
   reserve(new_capacity);
   callback(data(), size(), capacity());
}

Andersama avatar Apr 19 '23 12:04 Andersama

Now that I'm thinking about it, since I gaffed on the initially commit, this kind of vector implementation should likely use a different variation of unchecked emplace back given which mode the vector's in.

Andersama avatar Apr 25 '23 15:04 Andersama