PDMP3 icon indicating copy to clipboard operation
PDMP3 copied to clipboard

Code optimization and cleanup

Open technosaurus opened this issue 10 years ago • 1 comments

Lots of room for improvement.

  1. Floating point constants should be 0.0f (float) vs. 0.0 (double) for faster float ops
  2. slow math ops like sin, cos & pow should be offloaded to lookup tables where possible a. ) 1 version with init code to reduce binary size at the cost of startup time b. ) another version with static const lookup tables for faster startup at the cost of size c. ) some areas just need the math simplified for easier calculation multiply by precalculated 1/float is faster than divide by float some things need ops rearranged so constants can be merged and separated from variables
  3. unwind some loops into return/initialization (less memcpy lookalikes)
  4. functions should take pointers instead of using globals and some_func(void)

technosaurus avatar Feb 28 '15 17:02 technosaurus

static inline float Requantize_Pow_43(unsigned x) returns x^(4/3) This could be a simplified to 16(x/8)^4/3 or 256(x/64)^4/3 Which means the lookup table could be reduced in size. However pow(x,4.0f/3.0f) ==> cbrt((x_x)_(x*x)); to reduce the time by ~half; however, these can be combined using a variation of the fast inverse square problem:

/* Description: returns x^(4/3)
 * same as cbrt((x*x)*(x*x)), but optimized for the limited cases we handle (integers 0-8209)
 */
static inline float pow43opt2(float x) {
  if (x<2) return x;
  else x*=x,x*=x; //pow(x,4)
  float a3,x2=x+x;
  union {float f; unsigned i;} u = {x};
  u.i = u.i/3 + 0x2a517d3c; //~cbrt(x)
  int accuracy_iterations=2;  //reduce for speed, increase for precision
  while (accuracy_iterations--){ //Lancaster iterations
    a3=u.f*u.f*u.f;
    u.f *= (a3 + x2) / (a3 + a3 + x);
  }
  return u.f;
}

technosaurus avatar May 28 '15 13:05 technosaurus