FiniteStateEntropy icon indicating copy to clipboard operation
FiniteStateEntropy copied to clipboard

Architecture specific optimizations

Open codekatana opened this issue 8 years ago • 6 comments

Hello, I would like to know if it is possible to have ARM's SIMD (neon) routines to be added in huff0 and/or FSE encode/decode parts? That way, I can make them run a bit faster on raspberry pi.

codekatana avatar Oct 04 '17 13:10 codekatana

Why not just compile it with clang and tell it to vectorize the loops?

MarcusJohnson91 avatar Oct 04 '17 15:10 MarcusJohnson91

Indeed that's a nice way to do it however, wouldn't it be nicer if just like BLAS (openBLAS) we had some hand-coded assembly?

codekatana avatar Oct 05 '17 05:10 codekatana

It's a non trivial amount of work, with no guarantee of success. I'm certainly opened to a patch if someone wants to try it.

Cyan4973 avatar Oct 05 '17 07:10 Cyan4973

I agree, Yan. I was going through huff_* and fse* files so as to understand the code and find out possible areas. I was also going through your blog so as to understand zstd and find a suitable area which can be accelerated using SIMD on arm. I would very much appreciate any pointers regarding that.

codekatana avatar Oct 05 '17 07:10 codekatana

@codekatana No, If it was my repo, I'd want to keep the code base as clean as possible.

MarcusJohnson91 avatar Oct 05 '17 09:10 MarcusJohnson91

@bumblebritches57 - Yes, I can understand. Assembly can tend to be hard to read/maintain but in some situations, they provide good results. That's why BLAS libraries do their calculations in assembly and not in high level language.

codekatana avatar Oct 05 '17 09:10 codekatana