memmetrics: simplify locking and solve data race
This package was updating counters concurrently because of incorrect locking. There's not really a reason to have two separate locks or try to optimize with RW locks.
This change replaces the two RW locks with one exclusive lock. The RTMetrics struct methods always acquire this exclusive lock. This is simple and will be easier to keep correct as the code evolves.
Here is the performance before/after with the benchmark I added.
We went from 280ns -> 255ns on my intel 3ghz CPU. We got a little faster by switching from RWMutex to Mutex. The benchmark I added isn't concurrent. But this is sufficient evidence that we're not introducing any performance issue with this change. 255ns means we can lock around 11.7 million times per second which is much higher than any HTTP request rates.
Before this change.
seth.bunce@GQBS9Z2-DT:~/go/src/github.com/vulcand/oxy$ go test -count=1 -bench=. ./memmetrics
goos: linux
goarch: amd64
pkg: github.com/vulcand/oxy/memmetrics
BenchmarkRecord-20 4032144 280 ns/op 0 B/op 0 allocs/op
PASS
ok github.com/vulcand/oxy/memmetrics 1.463s
After this change.
seth.bunce@GQBS9Z2-DT:~/go/src/github.com/vulcand/oxy$ go test -count=1 -bench=. ./memmetrics
goos: linux
goarch: amd64
pkg: github.com/vulcand/oxy/memmetrics
BenchmarkRecord-20 4430594 253 ns/op 0 B/op 0 allocs/op
PASS
ok github.com/vulcand/oxy/memmetrics 1.431s
Maybe add a concurrent benchmark ?
func BenchmarkRecordConcurrently(b *testing.B) {
b.ReportAllocs()
rr, err := NewRTMetrics(RTClock(testutils.GetClock()))
require.NoError(b, err)
// warm up metrics. Adding a new code can do allocations, but in the steady
// state recording a code is cheap. We want to measure the steady state.
const codes = 100
for code := 0; code < codes; code++ {
rr.Record(code, time.Second)
}
concurrency := runtime.NumCPU() * 2
wg := sync.WaitGroup{}
wg.Add(concurrency)
b.ResetTimer()
for i := 0; i < concurrency; i++ {
go func() {
for j := 0; j < b.N; j++ {
rr.Record(j%codes, time.Second)
}
wg.Done()
}()
}
wg.Wait()
}
Our IT guys upgraded my CPU and I didn't even know it. I have 10 cores 20 threads at 3.3Ghz now. (the previous benchmarks were also on this CPU) Intel(R) Core(TM) i9-9820X CPU @ 3.30GHz
Good idea to add a concurrent benchmark because the context in which this package will be used should be highly concurrent.
I switched the benchmark so that b.N is divided among goroutines so we can do more of an apples to apples comparison. This makes it so the same number of results are recorded in the concurrent and non-concurrent test. I also set concurrency (number of goroutines in the benchmark) equal to the number of CPUs (hardware threads). The goroutines will end up having affinity to a hardware thread, so having extra concurrency would add more scheduler overhead. I tried setting concurrency double to number of CPUs and it only made like 40ns/op difference.
The poor scaling with CPU count of RWMutex is explained here. https://github.com/golang/go/issues/17973
Here is before the locking change. We observe a 6.8x slowdown with concurrency.
seth.bunce@GQBS9Z2-DT:~/go/src/github.com/vulcand/oxy$ go test -bench=. ./memmetrics
goos: linux
goarch: amd64
pkg: github.com/vulcand/oxy/memmetrics
BenchmarkRecord-20 4036707 281 ns/op 0 B/op 0 allocs/op
BenchmarkRecordConcurrently-20 919545 1929 ns/op 0 B/op 0 allocs/op
--- BENCH: BenchmarkRecordConcurrently-20
roundtrip_test.go:48: NumCPU: 20, Concurrency: 20, GOMAXPROCS: 20
PASS
ok github.com/vulcand/oxy/memmetrics 3.260s
Here is after the locking change. We observe a 4x slowdown with concurrency. We're roughly twice as efficient as the old code with RWMutex.
seth.bunce@GQBS9Z2-DT:~/go/src/github.com/vulcand/oxy$ go test -bench=. ./memmetrics
goos: linux
goarch: amd64
pkg: github.com/vulcand/oxy/memmetrics
BenchmarkRecord-20 4408964 255 ns/op 0 B/op 0 allocs/op
BenchmarkRecordConcurrently-20 1221457 1009 ns/op 0 B/op 0 allocs/op
--- BENCH: BenchmarkRecordConcurrently-20
roundtrip_test.go:48: NumCPU: 20, Concurrency: 20, GOMAXPROCS: 20
PASS
ok github.com/vulcand/oxy/memmetrics 3.669s
So this issue has not been resolved yet? i have the same issue. i think this issue may cause inaccurate statistics? why not resolve thie issue?