Use benchstat to analyse benchmarks automatically
time per operation
operation / dimension gonum time/op vector time/op delta
vector addition / 1 153ns ± 1% 18ns ± 5% -88.11% (p=0.029 n=4+4)
vector addition / 2 140ns ± 1% 22ns ± 1% -84.42% (p=0.029 n=4+4)
vector addition / 4 143ns ± 1% 27ns ± 0% -80.97% (p=0.029 n=4+4)
...
vector addition / 4194304 6.49ms ± 2% 7.23ms ± 1% +11.38% (p=0.029 n=4+4)
vector addition / 8388608 14.2ms ± 1% 15.8ms ± 4% +11.26% (p=0.029 n=4+4)
vector addition / 16777216 30.2ms ± 3% 36.6ms ±10% +21.30% (p=0.029 n=4+4)
[Geo mean] 12.1µs 7.6µs -37.25%
allocation per operation
operation / dimension gonum alloc/op vector alloc/op delta
vector addition / 1 104B ± 0% 8B ± 0% -92.31% (p=0.029 n=4+4)
vector addition / 2 112B ± 0% 16B ± 0% -85.71% (p=0.029 n=4+4)
vector addition / 4 128B ± 0% 32B ± 0% -75.00% (p=0.029 n=4+4)
...
vector addition / 4194304 33.6MB ± 0% 33.6MB ± 0% -0.00% (p=0.029 n=4+4)
vector addition / 8388608 67.1MB ± 0% 67.1MB ± 0% -0.00% (p=0.029 n=4+4)
vector addition / 16777216 134MB ± 0% 134MB ± 0% -0.00% (p=0.029 n=4+4)
[Geo mean] 45.2kB 32.8kB -27.52%
allocations per operation
operation / dimension gonum allocs/op vector allocs/op delta
vector addition / 1 3.00 ± 0% 1.00 ± 0% -66.67% (p=0.029 n=4+4)
vector addition / 2 3.00 ± 0% 1.00 ± 0% -66.67% (p=0.029 n=4+4)
vector addition / 4 3.00 ± 0% 1.00 ± 0% -66.67% (p=0.029 n=4+4)
...
vector addition / 4194304 3.00 ± 0% 1.00 ± 0% -66.67% (p=0.029 n=4+4)
vector addition / 8388608 3.00 ± 0% 1.00 ± 0% -66.67% (p=0.029 n=4+4)
vector addition / 16777216 3.00 ± 0% 1.00 ± 0% -66.67% (p=0.029 n=4+4)
[Geo mean] 3.00 1.00 -66.67%
How about a github workflow that comments on pull requests?
I was going to throw together a PR but I don't have a solution for testing assembly for different architectures.
General workflow was just:
- actions/checkout@v2 to checkout master
- run N iterations
- upload the results using https://github.com/actions/upload-artifact
- actions/checkout@v2 to checkout PR branch
- run N iterations
- download the artifacts previously uploaded
- compare using benchstat
- comment on PR to see delta
@andersfylling thats exactly what i was thinking as well.
You are welcome to create a PR with your suggested workflow.
In terms of the assembly part of the implementation, i would suggest to ignore it for now. You can do this by using build tags, see command below.
# Benchmark pure go implementation
go test --tags noasm --bench=.
I would like to benchmark the assembly part at some point, but it requires it's own set of benchmark tests. This will easily achievable in your suggested flow in the future though.