uint256: optimize Lsh, Rsh, SRsh
Description
Simplify codes and reduce jumps.
Test
go test ./...
returns
ok github.com/holiman/uint256 0.983s
Benchmark
go test -run - -bench sh$ -benchmem -count 10 >/tmp/old
go test -run - -bench sh$ -benchmem -count 10 >/tmp/new
benchstat old new
goos: linux
goarch: amd64
pkg: github.com/holiman/uint256
cpu: AMD Ryzen 7 7735H with Radeon Graphics
│ old │ new │
│ sec/op │ sec/op vs base │
Lsh/n_eq_0/big-16 30.73n ± 1% 30.92n ± 1% ~ (p=0.119 n=10)
Lsh/n_gt_192/big-16 40.80n ± 2% 40.52n ± 2% ~ (p=0.190 n=10)
Lsh/n_gt_128/big-16 40.86n ± 2% 40.72n ± 1% ~ (p=0.912 n=10)
Lsh/n_gt_64/big-16 38.52n ± 1% 38.74n ± 0% ~ (p=0.078 n=10)
Lsh/n_gt_0/big-16 37.91n ± 1% 37.92n ± 1% ~ (p=1.000 n=10)
Lsh/n_eq_0/uint256-16 1.776n ± 2% 1.772n ± 0% ~ (p=0.591 n=10)
Lsh/n_gt_192/uint256-16 2.212n ± 1% 1.565n ± 1% -29.23% (p=0.000 n=10)
Lsh/n_gt_128/uint256-16 2.429n ± 2% 1.989n ± 1% -18.10% (p=0.000 n=10)
Lsh/n_gt_64/uint256-16 3.541n ± 2% 2.432n ± 2% -31.32% (p=0.000 n=10)
Lsh/n_gt_0/uint256-16 4.438n ± 2% 2.462n ± 2% -44.52% (p=0.000 n=10)
Rsh/n_eq_0/big-16 34.28n ± 1% 34.33n ± 1% ~ (p=0.810 n=10)
Rsh/n_gt_192/big-16 16.91n ± 0% 16.90n ± 1% ~ (p=0.781 n=10)
Rsh/n_gt_128/big-16 32.67n ± 1% 32.61n ± 1% ~ (p=0.698 n=10)
Rsh/n_gt_64/big-16 36.87n ± 2% 37.30n ± 1% ~ (p=0.138 n=10)
Rsh/n_gt_0/big-16 38.12n ± 1% 37.83n ± 1% ~ (p=0.138 n=10)
Rsh/n_eq_0/uint256-16 1.579n ± 2% 1.596n ± 2% ~ (p=0.093 n=10)
Rsh/n_gt_192/uint256-16 1.778n ± 2% 1.784n ± 1% +0.34% (p=0.045 n=10)
Rsh/n_gt_128/uint256-16 2.453n ± 1% 2.001n ± 1% -18.43% (p=0.000 n=10)
Rsh/n_gt_64/uint256-16 3.536n ± 2% 2.429n ± 2% -31.32% (p=0.000 n=10)
Rsh/n_gt_0/uint256-16 4.200n ± 2% 2.459n ± 1% -41.45% (p=0.000 n=10)
geomean 9.424n 8.272n -12.23%
│ old │ new │
│ B/op │ B/op vs base │
Lsh/n_eq_0/big-16 64.00 ± 0% 64.00 ± 0% ~ (p=1.000 n=10) ¹
Lsh/n_gt_192/big-16 96.00 ± 0% 96.00 ± 0% ~ (p=1.000 n=10) ¹
Lsh/n_gt_128/big-16 96.00 ± 0% 96.00 ± 0% ~ (p=1.000 n=10) ¹
Lsh/n_gt_64/big-16 80.00 ± 0% 80.00 ± 0% ~ (p=1.000 n=10) ¹
Lsh/n_gt_0/big-16 80.00 ± 0% 80.00 ± 0% ~ (p=1.000 n=10) ¹
Lsh/n_eq_0/uint256-16 0.000 ± 0% 0.000 ± 0% ~ (p=1.000 n=10) ¹
Lsh/n_gt_192/uint256-16 0.000 ± 0% 0.000 ± 0% ~ (p=1.000 n=10) ¹
Lsh/n_gt_128/uint256-16 0.000 ± 0% 0.000 ± 0% ~ (p=1.000 n=10) ¹
Lsh/n_gt_64/uint256-16 0.000 ± 0% 0.000 ± 0% ~ (p=1.000 n=10) ¹
Lsh/n_gt_0/uint256-16 0.000 ± 0% 0.000 ± 0% ~ (p=1.000 n=10) ¹
Rsh/n_eq_0/big-16 64.00 ± 0% 64.00 ± 0% ~ (p=1.000 n=10) ¹
Rsh/n_gt_192/big-16 8.000 ± 0% 8.000 ± 0% ~ (p=1.000 n=10) ¹
Rsh/n_gt_128/big-16 48.00 ± 0% 48.00 ± 0% ~ (p=1.000 n=10) ¹
Rsh/n_gt_64/big-16 64.00 ± 0% 64.00 ± 0% ~ (p=1.000 n=10) ¹
Rsh/n_gt_0/big-16 64.00 ± 0% 64.00 ± 0% ~ (p=1.000 n=10) ¹
Rsh/n_eq_0/uint256-16 0.000 ± 0% 0.000 ± 0% ~ (p=1.000 n=10) ¹
Rsh/n_gt_192/uint256-16 0.000 ± 0% 0.000 ± 0% ~ (p=1.000 n=10) ¹
Rsh/n_gt_128/uint256-16 0.000 ± 0% 0.000 ± 0% ~ (p=1.000 n=10) ¹
Rsh/n_gt_64/uint256-16 0.000 ± 0% 0.000 ± 0% ~ (p=1.000 n=10) ¹
Rsh/n_gt_0/uint256-16 0.000 ± 0% 0.000 ± 0% ~ (p=1.000 n=10) ¹
geomean ² +0.00% ²
¹ all samples are equal
² summaries must be >0 to compute geomean
│ old │ new │
│ allocs/op │ allocs/op vs base │
Lsh/n_eq_0/big-16 1.000 ± 0% 1.000 ± 0% ~ (p=1.000 n=10) ¹
Lsh/n_gt_192/big-16 1.000 ± 0% 1.000 ± 0% ~ (p=1.000 n=10) ¹
Lsh/n_gt_128/big-16 1.000 ± 0% 1.000 ± 0% ~ (p=1.000 n=10) ¹
Lsh/n_gt_64/big-16 1.000 ± 0% 1.000 ± 0% ~ (p=1.000 n=10) ¹
Lsh/n_gt_0/big-16 1.000 ± 0% 1.000 ± 0% ~ (p=1.000 n=10) ¹
Lsh/n_eq_0/uint256-16 0.000 ± 0% 0.000 ± 0% ~ (p=1.000 n=10) ¹
Lsh/n_gt_192/uint256-16 0.000 ± 0% 0.000 ± 0% ~ (p=1.000 n=10) ¹
Lsh/n_gt_128/uint256-16 0.000 ± 0% 0.000 ± 0% ~ (p=1.000 n=10) ¹
Lsh/n_gt_64/uint256-16 0.000 ± 0% 0.000 ± 0% ~ (p=1.000 n=10) ¹
Lsh/n_gt_0/uint256-16 0.000 ± 0% 0.000 ± 0% ~ (p=1.000 n=10) ¹
Rsh/n_eq_0/big-16 1.000 ± 0% 1.000 ± 0% ~ (p=1.000 n=10) ¹
Rsh/n_gt_192/big-16 1.000 ± 0% 1.000 ± 0% ~ (p=1.000 n=10) ¹
Rsh/n_gt_128/big-16 1.000 ± 0% 1.000 ± 0% ~ (p=1.000 n=10) ¹
Rsh/n_gt_64/big-16 1.000 ± 0% 1.000 ± 0% ~ (p=1.000 n=10) ¹
Rsh/n_gt_0/big-16 1.000 ± 0% 1.000 ± 0% ~ (p=1.000 n=10) ¹
Rsh/n_eq_0/uint256-16 0.000 ± 0% 0.000 ± 0% ~ (p=1.000 n=10) ¹
Rsh/n_gt_192/uint256-16 0.000 ± 0% 0.000 ± 0% ~ (p=1.000 n=10) ¹
Rsh/n_gt_128/uint256-16 0.000 ± 0% 0.000 ± 0% ~ (p=1.000 n=10) ¹
Rsh/n_gt_64/uint256-16 0.000 ± 0% 0.000 ± 0% ~ (p=1.000 n=10) ¹
Rsh/n_gt_0/uint256-16 0.000 ± 0% 0.000 ± 0% ~ (p=1.000 n=10) ¹
geomean ² +0.00% ²
¹ all samples are equal
² summaries must be >0 to compute geomean
Codecov Report
All modified and coverable lines are covered by tests :white_check_mark:
Project coverage is 100.00%. Comparing base (
36aedf5) to head (788a0fd). Report is 2 commits behind head on master.
Additional details and impacted files
@@ Coverage Diff @@
## master #159 +/- ##
=========================================
Coverage 100.00% 100.00%
=========================================
Files 5 5
Lines 1649 1619 -30
=========================================
- Hits 1649 1619 -30
numbers from my machine look good:
[user@work uint256]$ benchstat bench.shifts.1 bench.shifts.2
name old time/op new time/op delta
Lsh/n_eq_0/uint256-8 3.91ns ±29% 3.93ns ±36% ~ (p=0.670 n=10+10)
Lsh/n_gt_192/uint256-8 5.11ns ±32% 3.23ns ±41% -36.78% (p=0.001 n=10+10)
Lsh/n_gt_128/uint256-8 6.90ns ±31% 6.14ns ±31% ~ (p=0.143 n=10+10)
Lsh/n_gt_64/uint256-8 9.43ns ±32% 6.23ns ±14% -33.94% (p=0.001 n=10+8)
Lsh/n_gt_0/uint256-8 12.1ns ±31% 8.1ns ±27% -33.38% (p=0.007 n=10+10)
Rsh/n_eq_0/uint256-8 3.85ns ±32% 4.80ns ± 0% +24.93% (p=0.000 n=10+8)
Rsh/n_gt_192/uint256-8 4.48ns ±33% 3.49ns ±36% ~ (p=0.063 n=10+10)
Rsh/n_gt_128/uint256-8 5.23ns ±19% 6.29ns ±34% ~ (p=0.515 n=8+10)
Rsh/n_gt_64/uint256-8 10.5ns ±36% 8.0ns ±29% -23.47% (p=0.011 n=10+10)
Rsh/n_gt_0/uint256-8 12.6ns ±40% 10.0ns ± 0% -20.61% (p=0.033 n=10+8)
Numbers aside, the code in this PR is a lot nicer than the existing!
Why n_eq_0 regress in your testing? The behavior is the same. Odd branch prediction?!
Why n_eq_0 regress in your testing? The behavior is the same.
Yeah, I don't know. Benchmarking is a bit black magic, getting good reproducible and "sane" results isn't always easy.