bevy icon indicating copy to clipboard operation
bevy copied to clipboard

Remove redundent information and optimize dynamic allocations in `Table`

Open Adamkob12 opened this issue 1 year ago • 8 comments

Objective

  • fix #12853
  • Make Table::allocate faster

Solution

The PR consists of multiple steps:

  1. For the component data: create a new data-structure that's similar to BlobVec but doesn't store len & capacity inside of it: "BlobArray" (name suggestions welcome)
  2. For the Tick data: create a new data-structure that's similar to ThinSlicePtr but supports dynamic reallocation: "ThinArrayPtr" (name suggestions welcome)
  3. Create a new data-structure that's very similar to Column that doesn't store len & capacity inside of it: "ThinColumn"
  4. Adjust the Table implementation to use ThinColumn instead of Column

The result is that only one set of len & capacity is stored in Table, in Table::entities

Notes Regarding Performance

Apart from shaving off some excess memory in Table, the changes have also brought noteworthy performance improvements: The previous implementation relied on Vec::reserve & BlobVec::reserve, but that redundantly repeated the same if statement (capacity == len). Now that check could be made at the Table level because the capacity and length of all the columns are synchronized; saving N branches per allocation. The result is a respectable performance improvement per every Table::reserve (and subsequently Table::allocate) call.

I'm hesitant to give exact numbers because I don't have a lot of experience in profiling and benchmarking, but these are the results I got so far:

add_remove_big/table benchmark after the implementation:

after_add_remove_big_table

add_remove_big/table benchmark in main branch (measured in comparison to the implementation):

main_add_remove_big_table

add_remove_very_big/table benchmark after the implementation:

after_add_remove_very_big

add_remove_very_big/table benchmark in main branch (measured in comparison to the implementation):

main_add_remove_very_big

cc @james7132 to verify


Changelog

  • New data-structure that's similar to BlobVec but doesn't store len & capacity inside of it: BlobArray
  • New data-structure that's similar to ThinSlicePtr but supports dynamic allocation:ThinArrayPtr
  • New data-structure that's very similar to Column that doesn't store len & capacity inside of it: ThinColumn
  • Adjust the Table implementation to use ThinColumn instead of Column
  • New benchmark: add_remove_very_big to benchmark the performance of spawning a lot of entities with a lot of components (15) each

Migration Guide

Table now uses ThinColumn instead of Column. That means that methods that previously returned Column, will now return ThinColumn instead.

ThinColumn has a much more limited and low-level API, but you can still achieve the same things in ThinColumn as you did in Column. For example, instead of calling Column::get_added_tick, you'd call ThinColumn::get_added_ticks_slice and index it to get the specific added tick.

Adamkob12 avatar Apr 11 '24 19:04 Adamkob12

looks like a segfault in run-examples-macos-metal ci? needs investigation e: fixed

Adamkob12 avatar Apr 11 '24 21:04 Adamkob12

For the component data: create a new data-structure that's similar to BlobVec but doesn't store len & capacity inside of it: "BlobArray" (name suggestions welcome) (This type makes a compile-time destinction between ZST types and non-ZST types)

I like UntrackedBlobVec for this concept -- "thin" in Rust contexts typically refers to pointer sizes, but the data structures aren't pointers. And IMO, "untracked" is clearer about the intention. By the same logic, I like UntrackedVec and UntrackedColumn over ThinArrayPtr and ThinColumn (assuming ThinArrayPtr is implemented like a Vec).

vertesians avatar Apr 11 '24 21:04 vertesians

I'm not sure thee ZST separation is actually worth it. The cost of the branch is only needed up on BlobVec::reserve_exact, which is only called when the entities Vec is reallocated, which occurs at typical power-of-two capacity threshold. The extra SparseSet likely negates any memory savings from the redundant length and capacity cuts. It might be worth benchmaking.

I did some benchmarking and the ZST separation indeed has negligible performance differences, it looks like the branch predictor is really pulling it's weight. So I merged the ZST and non-ZST columns. Thanks for calling this out!

Adamkob12 avatar Apr 19 '24 16:04 Adamkob12

Due to the amount of unsafe in this PR, merging now is a potential risk for 0.14, moving this to merge early on in the 0.15 cycle.

james7132 avatar May 18 '24 22:05 james7132

Unblocked 🎉

Adamkob12 avatar Jul 05 '24 08:07 Adamkob12

@Adamkob12 Do you have anytime to resolve conflicts to help this pr moving forword ? I don't want to miss out on such an excellent PR in 0.15 especially it has been approved by james7132 and james

re0312 avatar Aug 15 '24 13:08 re0312

@Adamkob12 sorry to leave you waiting. Once merge conflicts are resolved I'll get this in :)

alice-i-cecile avatar Aug 19 '24 21:08 alice-i-cecile

No worries! I'll try to resolve the conflicts soon.

Adamkob12 avatar Aug 21 '24 10:08 Adamkob12

Fixed the conflicts, note for reviewers: pay extra attention to #[cfg(feature = "track_change_detection")] adjacent code, It's the first time I've seen this feature and integrating it into the PR was the majority of the work so it's very possible that I missed some small details in documentation or something like that.

Also we should probably bench this again to be safe

Adamkob12 avatar Aug 29 '24 11:08 Adamkob12

Your PR increases Bevy Minimum Supported Rust Version. Please update the rust-version field in the root Cargo.toml file.

github-actions[bot] avatar Aug 31 '24 11:08 github-actions[bot]