uint256 uint256: optimize Lsh, Rsh, SRsh

Description

Simplify codes and reduce jumps.

Test

go test ./...

returns

ok  	github.com/holiman/uint256	0.983s

Benchmark

go test -run - -bench sh$ -benchmem -count 10 >/tmp/old
go test -run - -bench sh$ -benchmem -count 10 >/tmp/new
benchstat old new

goos: linux
goarch: amd64
pkg: github.com/holiman/uint256
cpu: AMD Ryzen 7 7735H with Radeon Graphics         
                        │     old     │                 new                 │
                        │   sec/op    │   sec/op     vs base                │
Lsh/n_eq_0/big-16         30.73n ± 1%   30.92n ± 1%        ~ (p=0.119 n=10)
Lsh/n_gt_192/big-16       40.80n ± 2%   40.52n ± 2%        ~ (p=0.190 n=10)
Lsh/n_gt_128/big-16       40.86n ± 2%   40.72n ± 1%        ~ (p=0.912 n=10)
Lsh/n_gt_64/big-16        38.52n ± 1%   38.74n ± 0%        ~ (p=0.078 n=10)
Lsh/n_gt_0/big-16         37.91n ± 1%   37.92n ± 1%        ~ (p=1.000 n=10)
Lsh/n_eq_0/uint256-16     1.776n ± 2%   1.772n ± 0%        ~ (p=0.591 n=10)
Lsh/n_gt_192/uint256-16   2.212n ± 1%   1.565n ± 1%  -29.23% (p=0.000 n=10)
Lsh/n_gt_128/uint256-16   2.429n ± 2%   1.989n ± 1%  -18.10% (p=0.000 n=10)
Lsh/n_gt_64/uint256-16    3.541n ± 2%   2.432n ± 2%  -31.32% (p=0.000 n=10)
Lsh/n_gt_0/uint256-16     4.438n ± 2%   2.462n ± 2%  -44.52% (p=0.000 n=10)
Rsh/n_eq_0/big-16         34.28n ± 1%   34.33n ± 1%        ~ (p=0.810 n=10)
Rsh/n_gt_192/big-16       16.91n ± 0%   16.90n ± 1%        ~ (p=0.781 n=10)
Rsh/n_gt_128/big-16       32.67n ± 1%   32.61n ± 1%        ~ (p=0.698 n=10)
Rsh/n_gt_64/big-16        36.87n ± 2%   37.30n ± 1%        ~ (p=0.138 n=10)
Rsh/n_gt_0/big-16         38.12n ± 1%   37.83n ± 1%        ~ (p=0.138 n=10)
Rsh/n_eq_0/uint256-16     1.579n ± 2%   1.596n ± 2%        ~ (p=0.093 n=10)
Rsh/n_gt_192/uint256-16   1.778n ± 2%   1.784n ± 1%   +0.34% (p=0.045 n=10)
Rsh/n_gt_128/uint256-16   2.453n ± 1%   2.001n ± 1%  -18.43% (p=0.000 n=10)
Rsh/n_gt_64/uint256-16    3.536n ± 2%   2.429n ± 2%  -31.32% (p=0.000 n=10)
Rsh/n_gt_0/uint256-16     4.200n ± 2%   2.459n ± 1%  -41.45% (p=0.000 n=10)
geomean                   9.424n        8.272n       -12.23%

                        │     old      │                 new                 │
                        │     B/op     │    B/op     vs base                 │
Lsh/n_eq_0/big-16         64.00 ± 0%     64.00 ± 0%       ~ (p=1.000 n=10) ¹
Lsh/n_gt_192/big-16       96.00 ± 0%     96.00 ± 0%       ~ (p=1.000 n=10) ¹
Lsh/n_gt_128/big-16       96.00 ± 0%     96.00 ± 0%       ~ (p=1.000 n=10) ¹
Lsh/n_gt_64/big-16        80.00 ± 0%     80.00 ± 0%       ~ (p=1.000 n=10) ¹
Lsh/n_gt_0/big-16         80.00 ± 0%     80.00 ± 0%       ~ (p=1.000 n=10) ¹
Lsh/n_eq_0/uint256-16     0.000 ± 0%     0.000 ± 0%       ~ (p=1.000 n=10) ¹
Lsh/n_gt_192/uint256-16   0.000 ± 0%     0.000 ± 0%       ~ (p=1.000 n=10) ¹
Lsh/n_gt_128/uint256-16   0.000 ± 0%     0.000 ± 0%       ~ (p=1.000 n=10) ¹
Lsh/n_gt_64/uint256-16    0.000 ± 0%     0.000 ± 0%       ~ (p=1.000 n=10) ¹
Lsh/n_gt_0/uint256-16     0.000 ± 0%     0.000 ± 0%       ~ (p=1.000 n=10) ¹
Rsh/n_eq_0/big-16         64.00 ± 0%     64.00 ± 0%       ~ (p=1.000 n=10) ¹
Rsh/n_gt_192/big-16       8.000 ± 0%     8.000 ± 0%       ~ (p=1.000 n=10) ¹
Rsh/n_gt_128/big-16       48.00 ± 0%     48.00 ± 0%       ~ (p=1.000 n=10) ¹
Rsh/n_gt_64/big-16        64.00 ± 0%     64.00 ± 0%       ~ (p=1.000 n=10) ¹
Rsh/n_gt_0/big-16         64.00 ± 0%     64.00 ± 0%       ~ (p=1.000 n=10) ¹
Rsh/n_eq_0/uint256-16     0.000 ± 0%     0.000 ± 0%       ~ (p=1.000 n=10) ¹
Rsh/n_gt_192/uint256-16   0.000 ± 0%     0.000 ± 0%       ~ (p=1.000 n=10) ¹
Rsh/n_gt_128/uint256-16   0.000 ± 0%     0.000 ± 0%       ~ (p=1.000 n=10) ¹
Rsh/n_gt_64/uint256-16    0.000 ± 0%     0.000 ± 0%       ~ (p=1.000 n=10) ¹
Rsh/n_gt_0/uint256-16     0.000 ± 0%     0.000 ± 0%       ~ (p=1.000 n=10) ¹
geomean                              ²               +0.00%                ²
¹ all samples are equal
² summaries must be >0 to compute geomean

                        │     old      │                 new                 │
                        │  allocs/op   │ allocs/op   vs base                 │
Lsh/n_eq_0/big-16         1.000 ± 0%     1.000 ± 0%       ~ (p=1.000 n=10) ¹
Lsh/n_gt_192/big-16       1.000 ± 0%     1.000 ± 0%       ~ (p=1.000 n=10) ¹
Lsh/n_gt_128/big-16       1.000 ± 0%     1.000 ± 0%       ~ (p=1.000 n=10) ¹
Lsh/n_gt_64/big-16        1.000 ± 0%     1.000 ± 0%       ~ (p=1.000 n=10) ¹
Lsh/n_gt_0/big-16         1.000 ± 0%     1.000 ± 0%       ~ (p=1.000 n=10) ¹
Lsh/n_eq_0/uint256-16     0.000 ± 0%     0.000 ± 0%       ~ (p=1.000 n=10) ¹
Lsh/n_gt_192/uint256-16   0.000 ± 0%     0.000 ± 0%       ~ (p=1.000 n=10) ¹
Lsh/n_gt_128/uint256-16   0.000 ± 0%     0.000 ± 0%       ~ (p=1.000 n=10) ¹
Lsh/n_gt_64/uint256-16    0.000 ± 0%     0.000 ± 0%       ~ (p=1.000 n=10) ¹
Lsh/n_gt_0/uint256-16     0.000 ± 0%     0.000 ± 0%       ~ (p=1.000 n=10) ¹
Rsh/n_eq_0/big-16         1.000 ± 0%     1.000 ± 0%       ~ (p=1.000 n=10) ¹
Rsh/n_gt_192/big-16       1.000 ± 0%     1.000 ± 0%       ~ (p=1.000 n=10) ¹
Rsh/n_gt_128/big-16       1.000 ± 0%     1.000 ± 0%       ~ (p=1.000 n=10) ¹
Rsh/n_gt_64/big-16        1.000 ± 0%     1.000 ± 0%       ~ (p=1.000 n=10) ¹
Rsh/n_gt_0/big-16         1.000 ± 0%     1.000 ± 0%       ~ (p=1.000 n=10) ¹
Rsh/n_eq_0/uint256-16     0.000 ± 0%     0.000 ± 0%       ~ (p=1.000 n=10) ¹
Rsh/n_gt_192/uint256-16   0.000 ± 0%     0.000 ± 0%       ~ (p=1.000 n=10) ¹
Rsh/n_gt_128/uint256-16   0.000 ± 0%     0.000 ± 0%       ~ (p=1.000 n=10) ¹
Rsh/n_gt_64/uint256-16    0.000 ± 0%     0.000 ± 0%       ~ (p=1.000 n=10) ¹
Rsh/n_gt_0/uint256-16     0.000 ± 0%     0.000 ± 0%       ~ (p=1.000 n=10) ¹
geomean                              ²               +0.00%                ²
¹ all samples are equal
² summaries must be >0 to compute geomean

Apr 29 '24 11:04 AaronChen0

Codecov Report

All modified and coverable lines are covered by tests :white_check_mark:

Project coverage is 100.00%. Comparing base (36aedf5) to head (788a0fd). Report is 2 commits behind head on master.

Additional details and impacted files

@@            Coverage Diff            @@
##            master      #159   +/-   ##
=========================================
  Coverage   100.00%   100.00%           
=========================================
  Files            5         5           
  Lines         1649      1619   -30     
=========================================
- Hits          1649      1619   -30

Apr 29 '24 11:04 codecov[bot]

numbers from my machine look good:

[user@work uint256]$ benchstat bench.shifts.1 bench.shifts.2
name                    old time/op  new time/op  delta
Lsh/n_eq_0/uint256-8    3.91ns ±29%  3.93ns ±36%     ~     (p=0.670 n=10+10)
Lsh/n_gt_192/uint256-8  5.11ns ±32%  3.23ns ±41%  -36.78%  (p=0.001 n=10+10)
Lsh/n_gt_128/uint256-8  6.90ns ±31%  6.14ns ±31%     ~     (p=0.143 n=10+10)
Lsh/n_gt_64/uint256-8   9.43ns ±32%  6.23ns ±14%  -33.94%  (p=0.001 n=10+8)
Lsh/n_gt_0/uint256-8    12.1ns ±31%   8.1ns ±27%  -33.38%  (p=0.007 n=10+10)
Rsh/n_eq_0/uint256-8    3.85ns ±32%  4.80ns ± 0%  +24.93%  (p=0.000 n=10+8)
Rsh/n_gt_192/uint256-8  4.48ns ±33%  3.49ns ±36%     ~     (p=0.063 n=10+10)
Rsh/n_gt_128/uint256-8  5.23ns ±19%  6.29ns ±34%     ~     (p=0.515 n=8+10)
Rsh/n_gt_64/uint256-8   10.5ns ±36%   8.0ns ±29%  -23.47%  (p=0.011 n=10+10)
Rsh/n_gt_0/uint256-8    12.6ns ±40%  10.0ns ± 0%  -20.61%  (p=0.033 n=10+8)

Numbers aside, the code in this PR is a lot nicer than the existing!

May 06 '24 07:05 holiman

Why n_eq_0 regress in your testing? The behavior is the same. Odd branch prediction?!

May 06 '24 08:05 AaronChen0

Why n_eq_0 regress in your testing? The behavior is the same.

Yeah, I don't know. Benchmarking is a bit black magic, getting good reproducible and "sane" results isn't always easy.

May 06 '24 08:05 holiman