common icon indicating copy to clipboard operation
common copied to clipboard

Rate limited errors

Open joe-elliott opened this issue 5 years ago • 5 comments

This PR attempts to address concerns in #192. It provides

  • A rate limited logger
  • A method of configuring the logger for high volume http errors through the server
  • Fixed possible race condition

While working on this PR I noticed this:

func (l Log) logWithRequest(r *http.Request) logging.Interface {
	traceID, ok := ExtractTraceID(r.Context())
	if ok {
		l.Log = l.Log.WithField("traceID", traceID)
	}

	return user.LogWith(r.Context(), l.Log)
}

Particularly the line: l.Log = l.Log.WithField("traceID", traceID) I'm unsure why the struct var is being overwritten in logWithRequest. This seems like a possible race condition to me? I assume there's one instance of the middleware that is shared by all requests in the pipeline?

joe-elliott avatar Aug 14 '20 12:08 joe-elliott

why the struct var is being overwritten

it's a local variable, so not likely to cause a problem, but this was made less confusing in #193

bboreham avatar Aug 20 '20 12:08 bboreham

Sorry I haven't been back here for a while. Code looks ok now; do you still want to merge it? Have you tried it out for real?

bboreham avatar Oct 29 '20 14:10 bboreham

@joe-elliott ping on questions

bboreham avatar Dec 01 '20 09:12 bboreham

Honestly, this has kind of fallen off of my radar. The initial need for this no longer something I am no longer involved in. Having said that I do think the rate limited logger has some value perhaps in weaveworks/common if we want to keep it.

The original reason this existed was to catch errors returned from Loki in the cortex-gw when the query frontend crashed. @cyriltovena @owen-d what do you think? is there still a need for this?

joe-elliott avatar Dec 01 '20 15:12 joe-elliott

Right now I have a high rate of errors like this:

level=warn ts=2021-03-01T11:35:02.01609324Z caller=grpc_logging.go:38 method=/cortex.Ingester/Push duration=2.741131ms err="rpc error: code = Code(429) desc = user=redacted: per-metric series limit (local limit: 300000 global limit: 0 actual local limit: 300000) exceeded for series {redacted}" msg="gRPC\n"

I think this PR as it stands would not help me, because it is focused on one specific http error.

bboreham avatar Mar 01 '21 11:03 bboreham