[BUG] Performance regression with VBD integrator
Bug Description
Hi,
Since the collision response in integrator_vbd.py has been refactored, we're seeing a 50-100% slowdown in overall simulation time. With the example_cloth_self_contact.py test, per-frame sim time gor4es from 0.19ms to 0.28ms on my GPU. For the more complex meshes we're testing with, consisting of 10,000s of vertices, the overall time is nearly twice as slow as before. From what I can tell, a good deal of the extra time is being spent in VBD_accumulate_contact_force_and_hessian. Note that this is much more expensive than the VBD_accumulate_contact_force_and_hessian_no_self_contact counterpart, even if there are no self contacts found. I'm guessing this is either due to a high thread count or register spill with the more complex kernel, but I haven't taken the time to do a deep dive to confirm.
Thanks, Cliff
System Information
No response