AABB recomputation for Skinned Meshes
Description
Right now the only way to cull a skinned/animated mesh, is to either:
- precompute an AABB for every keyframe, union the AABBs of two keyframes when interpolating and runtime update that for each
inverseBindPoseset - precompute/guess the AABB containing all animations and be conservative with culling that way
- set an obnoxiously large AABB (disable culling)
These approaches rely a lot on precomputation and break when animation blending or procedural keyframes are used.
Solution proposal
Either do the skinning in compute and accumulate the bounding box for an instance this way (requires a pause in the ILoDAndCullingSystem between drawcall prefix sum and bucket sorting scatter).
Or compute the AABB from the precomputed per-bone AABBs transformed by the bone's absolute modelspace and inversebindpose transforms. Unsure how exactly to achieve this only for visible instances, one idea is to have a per-skeletonInstance-per-inverseBindPose cache of AABBs.
Last option (maybe as an alternative to the compute skinning) would be to do atomicMin/Max in the vertex shader for the very first instance of a skeleton&inverseBindPose combo and compute the AABB "one frame behind". If we added some margin and invalidate it after a drastic keyframe change, it would probably be okay for culling.
Additional context
Probably best to implement this after #252
Need a Skin/InverseBindPose Pool to hold the inverse bind poses [Elements O(skeletonNodes x drawablePoses)]
BoneTranslationTable to pair a skin with a set of nodes being the bones, and a result buffer for the global bone transforms The allocations need to be contiguous. [Elements O(skeletonInstances x drawablePoses)] BoneTranslationTableManager to run a compute dispatch to compute the bone positions
Because multiple (which we have due to material batching or index limits due to meshletting) drawables might use the same bone translation table entry, but have radically different sets of vertices, and hence per-bone AABBs.
So need another pool to hold bindpose reference AABBs per drawable and bone [Elements O(skeletonNodes x drawables)]
Then a temporary buffer of per-drawable-per-skeleton-instance modelspace AABBs is needed.
Every used (visible) BTT will be updated, during the determination of visibility its easy to make a list of unique BTT entry and drawable pairs, then use this list to compute the modelspace AABBs.
Once armed with this list, its possible to have 1 invocation either to process all the bones of an entry and accumulate trivially, or prefix sum and workload balance for 1 invocation to process 1 bone but with the overhead of atomically accumulating the modelspace AABB (the destination addresses of which need to be cleared to an appropriate value). To properly evaluate our options we need a non trivial scene with skeletons of different armature bone counts;
Then we'd need to go back to finish our occlusion culling of drawables before we prefix sum the drawcalls.
Before:
...
DrawcallCull
DrawcallPrefixSum
...
After Balanced:
...
DrawcallCull
DrawableSkeletonInstancePrefixSum_and_TranslationTableSkeletonInstanceDedup
IndirectTranslationTableTransformUpdate
DrawableSkeletonInstanceAABBUpdate
DrawcallRecull
DrawcallPrefixSum
...
After Non-Atomic:
...
DrawcallCull
IndirectTranslationTableTransformUpdate
DrawableSkeletonInstanceAABBUpdate
DrawcallRecull
DrawcallPrefixSum
...