routing-release icon indicating copy to clipboard operation
routing-release copied to clipboard

Gorouter does not count 404s (route not found) as requests in metrics

Open domdom82 opened this issue 2 years ago • 6 comments

Is this a security vulnerability?

No

Issue

Gorouter ignores unknown_route errors for the purpose of metrics. It does handle the request and returns a 404 to the client but it does not count that as a request that's been handled by gorouter, nor does it count the 404 as a responses_4xx value in metrics.

Affected Versions

All

Context

In production, we have HAProxy sitting in front of gorouter for TLS offloading. The 4xx status code metrics between HAProxy differ in that 404s coming from gorouter are recorded as 4xx errors in HAProxy but not in gorouter. There, the 404s are "invisible" on the metrics dashboards. We feel this is wrong, because even though there was no backend app that responded to the request, there still was a request and it should be counted as such.

Steps to Reproduce

  1. Push some-app.cf-app.com
  2. curl https://some-app.cf-app.com
  3. curl -u <user:pwd> http://gorouter:8082/varz
  4. Notice how the requests and responses_2xx metrics have increased
  5. curl https://does-not-exist.cf-app.com
  6. curl -u <user:pwd> http://gorouter:8082/varz
  7. Notice how the requests and responses_4xx metrics have not increased

Expected result

requests and responses_4xx metrics should include 404 unknown_route requests.

Note: By extension, this should also apply to these errors:

  • 400 empty_host
  • 400 invalid_cf_app_instance_header
  • 503 no_endpoints
  • 503 Connection Limit Reached

Current result

requests and responses_4xx remain the same

Possible Fix

The request handling is stopped early in the lookup handler if the request cannot be forwarded. A likely solution will be to also record the correct metrics in this handler.

domdom82 avatar Nov 22 '23 13:11 domdom82

There are 2 sorts of "4xx" series messages that can be returned -- there's the curl https://does-not-exist.cf-app.com version, but also the version where the targeted application returns a 400. That might account for the difference in metrics between haproxy and gorouter. As per above, the "does-not-exist" does not increment the requests nor responses_4xx metrics. However, using a simple echo application (with ?status=401, see here and here) an application that exists and returns a 4xx message causes both metrics to be incremented. In the case above for "does-not-exist", note that the "bad_requests" metric is incremented.

The "bad_request" reflects the non-existant request (as no such application could be delivered), whereas the requests and response_XXX metrics represent the back-end health. For purposes of correlation with a load-balancer/proxy in front of the platform, including the "bad_request" metric will probably allow the numbers to align.

tpbrisco avatar Feb 20 '24 21:02 tpbrisco

Hey @domdom82 , Did @tpbrisco's analysis meet your needs?

MarcPaquette avatar Mar 18 '24 14:03 MarcPaquette

@domdom82 -- Dominc, I wanted to check in if there are open questions on this item. I'd like to submit it for closure. Check the above, it might answer your questions - or if there are other inquiries, let me know Tom

tpbrisco avatar Apr 02 '24 00:04 tpbrisco

Hi @tpbrisco @MarcPaquette , thanks for your replies. I think we disagree on what might be considered a "request". Looking at the code, current metrics suggest that only a request that made it through to the backend successfully is reported in the "total_requests" bucket.

To me, the term "total_requests" includes all requests handled by gorouter, whether they are successful or not, because work was done on the proxy that needs to be accounted for.

@tpbrisco following your logic I could combine the two buckets "total_requests" and "rejected_requests" (referred to as bad_requests in the code), however that would still not give me the full picture.

To gain the true number of requests handled by gorouter at a certain point in time, I would have to combine all of these metrics:

So my argument is to include not just requests that made it to the backend (better: ProxyRoundTripper) but also include requests that were handled by gorouter and got stopped in lookup.go - they are still requests and should be counted as such.

One solution could be, to

  • add l.reporter.CaptureRoutingRequest(nil) to the calls to l.reporter.CaptureBadRequest() in lookup.go or
  • add m.Batcher.BatchIncrementCounter("total_requests") to m.Batcher.BatchIncrementCounter("rejected_requests") and m.Batcher.BatchIncrementCounter("backend_exhausted_conns") in metricsreporter.go

domdom82 avatar Apr 04 '24 12:04 domdom82

Hey Dominick; I think I'm seeing what you're getting at -- let me reiterate to see if I'm on target.

At the moment, current behavior is

  • the metric "requests" is sum(responses_[2345x]xx) - leaving out bad_requests (and probably bad_gateways)

Desired behavior

  • the metric "requests" includes all requests (good, bad, or otherwise) handled by gorouter In this way, we could say that the "requests" metric represents the number of transactions handled by the gorouter. whereas the sum of the "responses_" metrics represents the ProxyRoundTripper handling.

In theory, we'd need to ensure that requests_per_sec was updating accordingly with the above?

Tom

On Thu, Apr 4, 2024 at 8:13 AM Dominik Froehlich @.***> wrote:

Hi @tpbrisco https://github.com/tpbrisco @MarcPaquette https://github.com/MarcPaquette , thanks for your replies. I think we disagree on what might be considered a "request". Looking at the code, current metrics suggest that only a request that made it through to the backend successfully is reported https://github.com/cloudfoundry/gorouter/blob/main/metrics/metricsreporter.go#L47 in the "total_requests" bucket.

To me, the term "total_requests" includes all requests handled by gorouter, wether they are successful or not, because work was done on the proxy that needs to be accounted for.

@tpbrisco https://github.com/tpbrisco following your logic I could combine the two buckets "total_requests" and "rejected_requests" (referred to as bad_requests https://github.com/cloudfoundry/gorouter/blob/main/metrics/metricsreporter.go#L38 in the code), however that would still not give me the full picture.

To gain the true number of requests handled by gorouter at a certain point in time, I would have to combine all of these metrics:

So my argument is to include not just requests that made it to the backend (better: ProxyRoundTripper https://github.com/cloudfoundry/gorouter/blob/main/proxy/round_tripper/proxy_round_tripper.go#L304) but also include requests that were handled by gorouter and got stopped in lookup.go https://github.com/cloudfoundry/gorouter/blob/main/handlers/lookup.go - they are still requests and should be counted as such.

One solution could be, to

  • add l.reporter.CaptureRoutingRequest(nil) to the calls to l.reporter.CaptureBadRequest() in lookup.go or
  • add m.Batcher.BatchIncrementCounter("total_requests") to m.Batcher.BatchIncrementCounter("rejected_requests") and m.Batcher.BatchIncrementCounter("backend_exhausted_conns") in metricsreporter.go

— Reply to this email directly, view it on GitHub https://github.com/cloudfoundry/routing-release/issues/365#issuecomment-2037015150, or unsubscribe https://github.com/notifications/unsubscribe-auth/ABDCCK7ROFFUDX3UWMIVVK3Y3U7YBAVCNFSM6AAAAAA7WIRQMGVHI2DSMVQWIX3LMV43OSLTON2WKQ3PNVWWK3TUHMZDAMZXGAYTKMJVGA . You are receiving this because you were mentioned.Message ID: @.***>

tpbrisco avatar Apr 07 '24 13:04 tpbrisco

@tpbrisco yep, you got that right. Exactly my sentiment.

domdom82 avatar Apr 24 '24 11:04 domdom82