Remove dynamic memory allocations inside lowest level assembly functions
This will allow the lowest lever assembly functions to be executed efficiently for small batches of cells.
Also makes a number of simplifications following careful review of code.
It would be also neat to check no heap allocation in certain scopes. Clang (>=19) has [[clang::nonallocating]] attribute, https://clang.llvm.org/docs/FunctionEffectAnalysis.html#the-nonblocking-and-nonallocating-attributes
Requires extra -Wfunction-effects pass.
That flag could be passed to the clang-tidy linting run in https://github.com/FEniCS/dolfinx/pull/3722. Clang tidy hooks into a LLVM based compile anyways.
Will update and merge in smaller pieces.