Iakov Polyak comments

Results 17 comments of


                                            Iakov Polyak

Specifying version of cppcheck fails on MacOS and Linux

As a follow-up - currently `cppcheck-2.7` is being installed for `ubuntu-latest` - is there still no way to provide explicit (more modern) version? Thanks!

Porting X25519 to OpenSSL?

Thank you very much! Yes, I already suspected that in its current state there might not be much performance gain from simply porting the existing assembly implementation, if one can...

Porting X25519 to OpenSSL?

> Then something is wrong, fe64 subroutines should have been called I am sorry, you are absolutely right - fe51 were called on Arm (with no assembly implementation), I looked...

Porting X25519 to OpenSSL?

> It looks like Graviton3 has 4 128-bit vector units and it simply takes a pair of them to execute a 256-bit instruction. Oh, I was unaware of that... It...

Porting X25519 to OpenSSL?

Right... According to this analysis I am not very hopeful indeed to see any better numbers with SVE on existing Arm architectures, but I guess it would still be interesting...

Porting X25519 to OpenSSL?

Right, I can see the specialized routines using `vpmadd52luq` and `vpmadd52huq`... Quick googling suggests that SVE2 lacks direct equivalents of those :/ So maybe no luck there.

Porting X25519 to OpenSSL?

Another bummer - apparently SVE lacks the 64-bit integer widening multiplies (`umull`/`umlal`) - only SVE2 introduces the equivalents :( So probably no point porting to SVE, only to SVE2, for...

Horner's method for POLY1305-AVX2

This makes perfect sense, thank you very much for the explanation!

Horner's method for POLY1305-AVX2

May I ask another question regarding your (this time armv8) implementation of POLY1305? In the main loop, when contracting over `IN01` terms, you start from IN01_2, not IN01_0 (starting from...

Horner's method for POLY1305-AVX2

Btw, I am seeing "only" 5% drop in efficiency reduction in my SVE2 version, which seems to me quite a good result, taken this requirement of "distributing" packed input data...