oxy memmetrics: simplify locking and solve data race

This package was updating counters concurrently because of incorrect locking. There's not really a reason to have two separate locks or try to optimize with RW locks.

This change replaces the two RW locks with one exclusive lock. The RTMetrics struct methods always acquire this exclusive lock. This is simple and will be easier to keep correct as the code evolves.

Nov 05 '20 23:11 sbunce

Here is the performance before/after with the benchmark I added.

We went from 280ns -> 255ns on my intel 3ghz CPU. We got a little faster by switching from RWMutex to Mutex. The benchmark I added isn't concurrent. But this is sufficient evidence that we're not introducing any performance issue with this change. 255ns means we can lock around 11.7 million times per second which is much higher than any HTTP request rates.

Before this change.

seth.bunce@GQBS9Z2-DT:~/go/src/github.com/vulcand/oxy$ go test -count=1 -bench=. ./memmetrics
goos: linux
goarch: amd64
pkg: github.com/vulcand/oxy/memmetrics
BenchmarkRecord-20    	 4032144	       280 ns/op	       0 B/op	       0 allocs/op
PASS
ok  	github.com/vulcand/oxy/memmetrics	1.463s

After this change.

seth.bunce@GQBS9Z2-DT:~/go/src/github.com/vulcand/oxy$ go test -count=1 -bench=. ./memmetrics
goos: linux
goarch: amd64
pkg: github.com/vulcand/oxy/memmetrics
BenchmarkRecord-20    	 4430594	       253 ns/op	       0 B/op	       0 allocs/op
PASS
ok  	github.com/vulcand/oxy/memmetrics	1.431s

Nov 06 '20 18:11 sbunce

Maybe add a concurrent benchmark ?

func BenchmarkRecordConcurrently(b *testing.B) {
	b.ReportAllocs()

	rr, err := NewRTMetrics(RTClock(testutils.GetClock()))
	require.NoError(b, err)

	// warm up metrics. Adding a new code can do allocations, but in the steady
	// state recording a code is cheap. We want to measure the steady state.
	const codes = 100
	for code := 0; code < codes; code++ {
		rr.Record(code, time.Second)
	}

	concurrency := runtime.NumCPU() * 2
	wg := sync.WaitGroup{}
	wg.Add(concurrency)

	b.ResetTimer()
	for i := 0; i < concurrency; i++ {
		go func() {
			for j := 0; j < b.N; j++ {
				rr.Record(j%codes, time.Second)
			}
			wg.Done()
		}()
	}

	wg.Wait()
}

Nov 07 '20 10:11 sylr

Our IT guys upgraded my CPU and I didn't even know it. I have 10 cores 20 threads at 3.3Ghz now. (the previous benchmarks were also on this CPU) Intel(R) Core(TM) i9-9820X CPU @ 3.30GHz

Good idea to add a concurrent benchmark because the context in which this package will be used should be highly concurrent.

I switched the benchmark so that b.N is divided among goroutines so we can do more of an apples to apples comparison. This makes it so the same number of results are recorded in the concurrent and non-concurrent test. I also set concurrency (number of goroutines in the benchmark) equal to the number of CPUs (hardware threads). The goroutines will end up having affinity to a hardware thread, so having extra concurrency would add more scheduler overhead. I tried setting concurrency double to number of CPUs and it only made like 40ns/op difference.

The poor scaling with CPU count of RWMutex is explained here. https://github.com/golang/go/issues/17973

Here is before the locking change. We observe a 6.8x slowdown with concurrency.

seth.bunce@GQBS9Z2-DT:~/go/src/github.com/vulcand/oxy$ go test -bench=. ./memmetrics
goos: linux
goarch: amd64
pkg: github.com/vulcand/oxy/memmetrics
BenchmarkRecord-20                	 4036707	       281 ns/op	       0 B/op	       0 allocs/op
BenchmarkRecordConcurrently-20    	  919545	      1929 ns/op	       0 B/op	       0 allocs/op
--- BENCH: BenchmarkRecordConcurrently-20
    roundtrip_test.go:48: NumCPU: 20, Concurrency: 20, GOMAXPROCS: 20
PASS
ok  	github.com/vulcand/oxy/memmetrics	3.260s

Here is after the locking change. We observe a 4x slowdown with concurrency. We're roughly twice as efficient as the old code with RWMutex.

seth.bunce@GQBS9Z2-DT:~/go/src/github.com/vulcand/oxy$ go test -bench=. ./memmetrics
goos: linux
goarch: amd64
pkg: github.com/vulcand/oxy/memmetrics
BenchmarkRecord-20                	 4408964	       255 ns/op	       0 B/op	       0 allocs/op
BenchmarkRecordConcurrently-20    	 1221457	      1009 ns/op	       0 B/op	       0 allocs/op
--- BENCH: BenchmarkRecordConcurrently-20
    roundtrip_test.go:48: NumCPU: 20, Concurrency: 20, GOMAXPROCS: 20
PASS
ok  	github.com/vulcand/oxy/memmetrics	3.669s

Nov 09 '20 17:11 sbunce

So this issue has not been resolved yet？ i have the same issue. i think this issue may cause inaccurate statistics？ why not resolve thie issue?

Apr 03 '23 15:04 dreamstick