Results 31 comments of Dougall Johnson

Yeah, ARM64 probably would benefit too as it has a signed-bitfield extraction instruction (SBFX).

Oops, looks like counting bits was a mistake. I bumped the limit up to 15, which should reflect the maximum advance amount (7), plus the second 8-byte load. And removed...

Nevermind – I was reading your benchmarks wrong. Too used to my own MB/s numbers rather than just seconds. Looks like this is a speedup on MSVC too, so I'll...

> oddly this needs workflow approval too. It looks like it ran some but is now asking again for approval. Yeah, I'm guessing I'm still a "first-time contributor" and will...

Yeah - for now I'd lean towards "just slap objdump on it" (and call it objdump) as an easy and mostly-fine tradeoff? (Free for devs without binja licenses, and faster...

> This effect likely also exists on ARM processors, if any of them are capable of doing multiple vector additions in parallel. i.e. superscalar SIMD Yeah, generally this is possible...

Ah, good point, yeah, I don't know why I didn't think of that.

Yeah, that's a good idea – I just wasn't aware of MTLBinaryArchive at the time. I haven't tried overwriting code in binary archives, but assuming that works it should be...

Good questions, good answers. I'm leaving this issue open to track adding this to the documentation.

Thanks! I'm not particularly interested in RVV – adding SEW and LMUL to the vector-length agnosticism would make it even trickier to illustrate. SME and Neon are higher priorities for...