PDMP3
PDMP3 copied to clipboard
Code optimization and cleanup
Lots of room for improvement.
- Floating point constants should be 0.0f (float) vs. 0.0 (double) for faster float ops
- slow math ops like sin, cos & pow should be offloaded to lookup tables where possible a. ) 1 version with init code to reduce binary size at the cost of startup time b. ) another version with static const lookup tables for faster startup at the cost of size c. ) some areas just need the math simplified for easier calculation multiply by precalculated 1/float is faster than divide by float some things need ops rearranged so constants can be merged and separated from variables
- unwind some loops into return/initialization (less memcpy lookalikes)
- functions should take pointers instead of using globals and some_func(void)
static inline float Requantize_Pow_43(unsigned x) returns x^(4/3) This could be a simplified to 16(x/8)^4/3 or 256(x/64)^4/3 Which means the lookup table could be reduced in size. However pow(x,4.0f/3.0f) ==> cbrt((x_x)_(x*x)); to reduce the time by ~half; however, these can be combined using a variation of the fast inverse square problem:
/* Description: returns x^(4/3)
* same as cbrt((x*x)*(x*x)), but optimized for the limited cases we handle (integers 0-8209)
*/
static inline float pow43opt2(float x) {
if (x<2) return x;
else x*=x,x*=x; //pow(x,4)
float a3,x2=x+x;
union {float f; unsigned i;} u = {x};
u.i = u.i/3 + 0x2a517d3c; //~cbrt(x)
int accuracy_iterations=2; //reduce for speed, increase for precision
while (accuracy_iterations--){ //Lancaster iterations
a3=u.f*u.f*u.f;
u.f *= (a3 + x2) / (a3 + a3 + x);
}
return u.f;
}