Iakov Polyak
Iakov Polyak
As a follow-up - currently `cppcheck-2.7` is being installed for `ubuntu-latest` - is there still no way to provide explicit (more modern) version? Thanks!
Thank you very much! Yes, I already suspected that in its current state there might not be much performance gain from simply porting the existing assembly implementation, if one can...
> Then something is wrong, fe64 subroutines should have been called I am sorry, you are absolutely right - fe51 were called on Arm (with no assembly implementation), I looked...
> It looks like Graviton3 has 4 128-bit vector units and it simply takes a pair of them to execute a 256-bit instruction. Oh, I was unaware of that... It...
Right... According to this analysis I am not very hopeful indeed to see any better numbers with SVE on existing Arm architectures, but I guess it would still be interesting...
Right, I can see the specialized routines using `vpmadd52luq` and `vpmadd52huq`... Quick googling suggests that SVE2 lacks direct equivalents of those :/ So maybe no luck there.
Another bummer - apparently SVE lacks the 64-bit integer widening multiplies (`umull`/`umlal`) - only SVE2 introduces the equivalents :( So probably no point porting to SVE, only to SVE2, for...
This makes perfect sense, thank you very much for the explanation!
May I ask another question regarding your (this time armv8) implementation of POLY1305? In the main loop, when contracting over `IN01` terms, you start from IN01_2, not IN01_0 (starting from...
Btw, I am seeing "only" 5% drop in efficiency reduction in my SVE2 version, which seems to me quite a good result, taken this requirement of "distributing" packed input data...