Florian Lemaitre

Results 99 comments of Florian Lemaitre

I'm maybe a bit late on the subject, but I was totally fine with `@Vector` (except for the potential confusion with `std.Vector`). I don't think it is really worth to...

I would like to highlight an argument against consistency. Pointers and slices are "decorators" of any types, even user defined ones. However, SIMD should most likely be restrained to primitive...

Here are my own answers to those questions: > Do we need to support TLS of dynamic modules? (loaded with dlopen) I would say yes, but I would prefer no....

One function where I did need constants is with the fast square root reciprocal. I basically use the "fast square root reciprocal" from quake, but extended to double precision, and...

The benchmark I had at the time is super large, but this was part of a Cholesky factorization kernel on tiny matrices (like 3x3 or 5x5). In fact, to evaluate...

> Baseline: https://gcc.godbolt.org/z/w-j8QW, 64.97 cycles per loop iteration > fast-math: https://gcc.godbolt.org/z/hVsk2B, 62.96 cycles per loop iteration > fast-math + fma: https://gcc.godbolt.org/z/XRKt4C, 40.79 cycles per loop iteration Beware that your baseline...

@nfrechette Using FMAs cannot be slower on recent hardware because the FMA instruction has the exact same latency and throughput as the multiplication (latency 4c, throughput 2/c on skylake). Basically,...

To be noted that `__builtin_*` are not really intrinsics, but compilers builtins. The difference being a compiler builtin is provided by your compiler vendor, while intrinsics are defined but the...

As far as the compiler is concerned, WASM is the target platform and thus is no different from any actual platform/architecture. So if you need intrinsics for x86, you need...

The main problem I see is that floating point arithmetic is not associative. So the result will depend on the implementation. We could have the following: - `f32x4.addHorizOrdered(x: v128) ->...