Add emplace_back_unchecked
I don't expect this to be accepted, however I actually wrote a short paper on this a few years ago. I tend to add this utility and another for modifying the size to my own custom vectors to be able to get every last bit of performance where I can. You can get an easy 100% performance swing with this over emplace_back.
Even better is actually this pattern:
std::vector<char> a_pseudo_string;
a_pseudo_string.reserve(32);
char *danger_ptr = a_pseudo_string.data();
for (size_t i = 0; i < 32; i++) {
danger_ptr[i] = i+32;
}
// an incredibly awkward use of assign
a_psudo_string.assign(danger_ptr, danger_ptr+32);
Where assign also can get an unchecked variation. Since the internals of svector already has
template <direction D>
void set_size(size_t s) {
if constexpr (D == direction::direct) {
set_direct_and_size(s);
} else {
indirect()->size(s);
}
}
The second is already possible*.
Alternatively could be named unchecked_emplace_back.
I think this is what David Stone proposes ("append_from_capacity") in his CppCon talk
I googled a bit, I think an API like here in P1010R1 sounds good to me:
-
T* uninitialized_data()- Returns a pointer to storage that would back elements [size(), capacity()). -
insert_from_capacity(size_type n)Appends n elements from capacity. The application must have initialized the storage backing these elements otherwise the behavior is undefined. (or name itappend_from_capacity)
It's not a new idea, I was about to submit my paper to the standards committee but someone mentioned that a similar idea was rejected years beforehand. I'm not surprised though that it keeps getting brought up, it's a pretty easy performance improvement when you know you can use it.
insert_from_capacity's a strange name since insert's used to place data essentially "anywhere". I tend to call mine append_unchecked or unchecked_append since for whatever reason append is not defined for vector, but is for strings.
Here's some old nanobench results from my draft from 2 years ago.
| MSVC x64 debug
| ns/op | op/s | err% | total | emplace_back benchmark
|--------------------:|--------------------:|--------:|----------:|:----------
| 91.60 | 10,917,030.57 | 0.5% | 0.00 | `emplace_back`
| 31.82 | 31,421,838.18 | 0.4% | 0.00 | `unchecked_emplace_back`
| 9.04 | 110,663,983.90 | 3.4% | 0.00 | `assign`
| 8.65 | 115,606,936.42 | 0.1% | 0.00 | `self_assign`
| MSVC x64 release
| ns/op | op/s | err% | total | emplace_back benchmark
|--------------------:|--------------------:|--------:|----------:|:----------
| 0.97 | 1,026,392,961.88 | 0.1% | 0.00 | `emplace_back`
| 0.89 | 1,126,637,554.59 | 0.1% | 0.00 | `unchecked_emplace_back`
| 0.57 | 1,741,245,136.19 | 0.1% | 0.00 | `assign`
| 0.53 | 1,899,335,232.67 | 0.0% | 0.00 | `self_assign`
| Clang x64 debug
| ns/op | op/s | err% | total | emplace_back benchmark
|--------------------:|--------------------:|--------:|----------:|:----------
| 21.06 | 47,483,380.82 | 0.1% | 0.00 | `emplace_back`
| 20.40 | 49,019,607.84 | 0.1% | 0.00 | `unchecked_emplace_back`
| 2.78 | 359,640,359.64 | 0.1% | 0.00 | `assign`
| 2.71 | 369,290,573.37 | 0.0% | 0.00 | `self_assign`
| Clang x64 release
| ns/op | op/s | err% | total | emplace_back benchmark
|--------------------:|--------------------:|--------:|----------:|:----------
| 1.77 | 564,440,263.41 | 0.1% | 0.00 | `emplace_back`
| 0.80 | 1,256,544,502.62 | 0.2% | 0.00 | `unchecked_emplace_back`
| 0.55 | 1,803,127,874.89 | 0.0% | 0.00 | `assign`
| 0.53 | 1,873,767,258.38 | 0.0% | 0.00 | `self_assign`
It somewhat depends on the complexity of the type of course, but as you can see using assign's WAY faster. I think the strange result was actually that MSVC seemingly out optimized clang at the time for regular emplace_back. I can't remember why I think it was that msvc managed to optimize with SIMD instructions, when the conditional check was removed from both I think the optimizers got a fairly decent shot optimizing the append.
Nice nanobench benchmarks @Andersama :smile: I also don't like insert_from_capacity. Digging more into standardization, the last status of P1010 is that it should be revised, based on the accepted resize_and_overwrite: https://github.com/cplusplus/papers/issues/322
This will be in C++23 for std::string, and It would probably be best to have the same API for vector: https://en.cppreference.com/w/cpp/string/basic_string/resize_and_overwrite
Taking a look at the cppreference example
food.resize_and_overwrite(
resize_to,
[food_size = food.size()](char* p, [std::size_t](https://en.cppreference.com/w/cpp/types/size_t) n) noexcept -> [std::size_t](https://en.cppreference.com/w/cpp/types/size_t) {
// p[0]..p[n] is the assignable range
// p[0]..p[min(n, food_size) - 1] is the readable range
// (contents initially equal to the original string)
// Debug print:
[std::cout](https://en.cppreference.com/w/cpp/io/cout) << "In Operation(); n: " << n << '\n';
// Copy fruits to the buffer p while there is enough space.
char* first = p + food_size;
for (char* const end = p + n; const [std::string_view](https://en.cppreference.com/w/cpp/string/basic_string_view) fruit : fruits) {
char* last = first + fruit.size() + 1;
if (last > end)
break;
*first++ = ' ';
std::[ranges::copy](https://en.cppreference.com/w/cpp/algorithm/ranges/copy)(fruit, first);
first = last;
}
const auto final_size { static_cast<[std::size_t](https://en.cppreference.com/w/cpp/types/size_t)>(first - p) };
// Debug print:
[std::cout](https://en.cppreference.com/w/cpp/io/cout) << "In Operation(); final_size: " << final_size << '\n';
[assert](https://en.cppreference.com/w/cpp/error/assert)(final_size <= n);
return final_size; // Return value is the actual new length
// of the string, must be in range 0..n
});
I have the distinct feeling the signature is confusing. If I were to write the callback signature I'd include the old size in the parameters, or alternatively pass the original vector into the callback. It looks like the current signature is passing the entire range via the pointer and the new capacity essentially forcing the user to create a lambda to capture the size.
I'd consider for example:
food.lazy_append(new_capacity, [](char *ptr, size_t old_size, size_t new_capacity){
// now ptr[0..old_size-1] is still available but we don't have to be capturing it ourselves
// bonus we can use c functions as a callback meaning this'd be backwards compatible with older c++ standards and could be done without a template
});
//bonus, the implementation's probably the simplest thing I can think of
void lazy_append(size_t new_capacity, void(*callback)(value_type *, size_t, size_t)) {
reserve(new_capacity);
callback(data(), size(), capacity());
}
Now that I'm thinking about it, since I gaffed on the initially commit, this kind of vector implementation should likely use a different variation of unchecked emplace back given which mode the vector's in.