prometheus-proxy icon indicating copy to clipboard operation
prometheus-proxy copied to clipboard

prometheus-proxy with Ingress-Nginx

Open menvol3 opened this issue 1 year ago • 72 comments

Hello

I'm working on deploying Prometheus-proxy within the AWS EKS cluster and making it accessible to agents via NGINX ingress.

I’ve configured NGINX to forward incoming traffic to the backend on port 50051, which is used for gRPC.

Rules:
  Host                                           Path  Backends
  ----                                           ----  --------
  prometheus-proxy.service.net  
                                                 /   prometheus-proxy:50051 (10.10.51.143:50051)
Annotations:                            nginx.ingress.kubernetes.io/backend-protocol: GRPC
                                                 nginx.ingress.kubernetes.io/ssl-redirect: true

Also my service configuration

Type:              ClusterIP
Port:              grpc  50051/TCP
TargetPort:        50051/TCP
Endpoints:         10.10.51.143:50051

The Proxy started with the default configuration, and the agent started with the next args

java -jar prometheus-agent.jar -Dagent.proxy.hostname=prometheus-proxy.service.net:443 --config config.conf

My issue is that the agent is unable to connect to the proxy. There are no logs in the proxy indicating any connection attempts, and I also can't see any related logs in my NGINX pod about the agent's connection.

Agent log is hanging on connecting 15:44:12.103 INFO [AgentGrpcService.kt:175] - Connecting to proxy at prometheus-proxy.service.net:443 using plaintext...

However, when I try to access the proxy using the grpcurl tool with the following command grpcurl -insecure prometheus-proxy.service.net:443 list I receive the following output: Failed to list services: server does not support the reflection API

Additionally, some logs appear in the ingress controller."

145.245.130.26 - - [29/Nov/2024:14:56:45 +0000] "POST /grpc.reflection.v1.ServerReflection/ServerReflectionInfo HTTP/2.0" 200 0 "-" "grpcurl/1.9.1 grpc-go/1.61.0" 157 0.049 [prometheus-proxy-50051] [] 10.10.51.143:50051 0 0.049 200 4a75fd48134f8d5b11939074f090ad4f
145.245.130.26 - - [29/Nov/2024:14:56:45 +0000] "POST /grpc.reflection.v1alpha.ServerReflection/ServerReflectionInfo HTTP/2.0" 200 0 "-" "grpcurl/1.9.1 grpc-go/1.61.0" 62 0.007 [prometheus-proxy-50051] [] 10.10.51.143:50051 0 0.007 200 411fe7fc6bbe7c37b2c445d56874353a
145.245.130.26 - - [29/Nov/2024:14:56:46 +0000] "POST /grpc.reflection.v1alpha.ServerReflection/ServerReflectionInfo HTTP/2.0" 200 0 "-" "grpcurl/1.9.1 grpc-go/1.61.0" 16 0.007 [prometheus-proxy-50051] [] 10.10.51.143:50051 0 0.007 200 55dd037169407a9b404159bf4ff194f7

Also, I tried to use TRANSPORT_FILTER_DISABLED for both Proxy and Agent - didn't help

Could someone please point out what I may have missed or what might be causing the issue?

menvol3 avatar Nov 29 '24 15:11 menvol3

Hi @menvol3

It is odd that you are hanging and not getting an exception when connecting. It is hanging in a stub, so we are at the mercy of gRPC.

How about if I enable the reflection API and then you can see if grpcurl works? Perhaps grpcurl will provide a clue on what the issue us.

Paul

pambrose avatar Nov 29 '24 17:11 pambrose

I added support for gRPC Reflection. Let's see what grpcurl reports for your setup. I will push the release out a little later today.

pambrose avatar Nov 29 '24 19:11 pambrose

I pushed the 1.23.0 release. Let's see what gprcurl has to say.

If you you use the grpcurl -plaintext option, make sure you comment out any tls {} config info on the proxy.

pambrose avatar Nov 30 '24 04:11 pambrose

Hello @pambrose

I've tried grpcurl with the latest versions of the proxy and client, and it seems to be working now

grpcurl -d '{}' prometheus-proxy.service.net:443 list

And output was

Warning: The -d argument is not used with 'list' or 'describe' verb.
ProxyService
grpc.reflection.v1.ServerReflection

However, the client is still hanging with the same status INFO [AgentGrpcService.kt:175] - Connecting to proxy at prometheus-proxy.service.net:443 using plaintext...

menvol3 avatar Dec 02 '24 17:12 menvol3

Interesting. How about commenting out any tls settings on the proxy and try grpcurl with the -plaintext option.

pambrose avatar Dec 02 '24 18:12 pambrose

Also, it is curious that your log info says you are using port 443 with plaintext. 443 is usually assicuated with https traffic, which is not plaintext.

Do you have tls settings in your proxy or agent config files?

pambrose avatar Dec 02 '24 18:12 pambrose

Interesting. How about commenting out any tls settings on the proxy and try grpcurl with the -plaintext option.

Failed to dial target host "prometheus-proxy.service.net:443": context deadline exceeded

Do you have tls settings in your proxy or agent config files?

I only have an SSL certificate via AWS ACM. When the client makes a request, it goes to the AWS ALB. The ALB terminates the HTTPS connection and forwards the data to the ingress, which then sends the data to the proxy.

menvol3 avatar Dec 03 '24 13:12 menvol3

The hope was to comment out all thing tls-related in your config on both ends and see if it works with plaintext on a non-HTTPS port. If that works, then we can conclude the issue is with your TLS setup. If the ALB is forcing you to HTTPS, then that presents a problem. But if grpcurl can connect plaintext, then the agent should be able to connect via plaintext as well, assuming the ALB is invovled in both cases. The grpcurl client is the same as the agent code.

pambrose avatar Dec 03 '24 15:12 pambrose

If the ALB is forcing you to HTTPS, then that presents a problem.

Unfortunately, that's the case in my setup—HTTPS is configured on the ALB side, so I can't disable it for separate ingress object

menvol3 avatar Dec 03 '24 15:12 menvol3

Then I am wondering how you were able to connect with grpcurl. If grpcurl worked, then the agent should work with the same tls settings. They both use the same grpc libraries.

pambrose avatar Dec 03 '24 16:12 pambrose

Yeah, it is really strange. Below is the output from the ingress controller after I made four requests using grpcurl

145.245.130.26 - - [03/Dec/2024:16:12:00 +0000] "POST /grpc.reflection.v1.ServerReflection/ServerReflectionInfo HTTP/2.0" 200 67 "-" "grpcurl/1.9.1 grpc-go/1.61.0" 157 0.209 [prometheus-proxy-50051] [] 10.10.51.143:50051 100 0.208 200 815d320c5af57fdbcb14ff5b80adebf7
145.245.130.26 - - [03/Dec/2024:16:13:03 +0000] "POST /grpc.reflection.v1.ServerReflection/ServerReflectionInfo HTTP/2.0" 200 34 "-" "grpcurl/1.9.1 grpc-go/1.61.0" 158 0.190 [prometheus-proxy-50051] [] 10.10.51.143:50051 67 0.190 200 5e6e508703486783057ff7bb8768ad1b
145.245.130.26 - - [03/Dec/2024:16:13:09 +0000] "POST /grpc.reflection.v1.ServerReflection/ServerReflectionInfo HTTP/2.0" 200 67 "-" "grpcurl/1.9.1 grpc-go/1.61.0" 157 0.189 [prometheus-proxy-50051] [] 10.10.51.143:50051 100 0.189 200 884c7af09decfb1b94262aedd44945c2
145.245.130.26 - - [03/Dec/2024:16:13:24 +0000] "POST /grpc.reflection.v1.ServerReflection/ServerReflectionInfo HTTP/2.0" 200 67 "-" "grpcurl/1.9.1 grpc-go/1.61.0" 157 0.185 [prometheus-proxy-50051] [] 10.10.51.143:50051 100 0.186 200 45f6d6b3acb6b35f1bd8726685f49d95

However, at the same time, the proxy-agent still hangs on

16:11:31.176 INFO [AgentGrpcService.kt:163] - Connecting to proxy at prometheus-proxy.service.net:443 using plaintext... and nothing about it in ingress logs

menvol3 avatar Dec 03 '24 16:12 menvol3

What arguments did you use with grpcurl?

pambrose avatar Dec 03 '24 16:12 pambrose

No one, just grpcurl prometheus-proxy.service.net:443 list

menvol3 avatar Dec 03 '24 16:12 menvol3

Am I understanding this correctly? At this moment, for the connection between the agent and the proxy, we only need to expose the gRPC port, which by default is 50051? In my case, the request first comes into the ALB on port 443, where HTTPS is terminated, and then the traffic is forwarded to the proxy

menvol3 avatar Dec 03 '24 16:12 menvol3

Yes, only one port is required and I think 50051 is the default. The proxy log file reports which port is being used.

pambrose avatar Dec 03 '24 16:12 pambrose

Is the ALB forwarding to port 50051?

pambrose avatar Dec 03 '24 16:12 pambrose

That feels like where the problem is. But again, if the ALB is forwarding properly for grpcurl, it should be doing the same for the agent.

pambrose avatar Dec 03 '24 16:12 pambrose

Hang on, dumb question here: why is the proxy on the backside of the ALB? The agent should be on the backside and it calls out to the proxy, which is not behind the ALB. The agent connects from the inside to the outside of the ALB, which is where a proxy is.

pambrose avatar Dec 03 '24 16:12 pambrose

Do you know what I am saying?

pambrose avatar Dec 03 '24 16:12 pambrose

Is the ALB forwarding to port 50051?

ALB terminates TLS (HTTPS on port 443) and forwards traffic to the Ingress (plain HTTP/2). The Ingress Controller then forwards the traffic to gRPC service on port 50051.

But again, if the ALB is forwarding properly for grpcurl, it should be doing the same for the agent.

Yeah, looks like requests via grpcurl are working normally, despite this the agent can't initiate connection to proxy for some reason

menvol3 avatar Dec 03 '24 16:12 menvol3

Do you know what I am saying?

In my setup, the agent is located on the same host that I want to monitor with Prometheus. However, it is not directly accessible to Prometheus. Therefore, we are using an agent that employs a push model, allowing it to connect to the proxy, which is accessible to the agent (e.g., through an allowlist). Is it correct understanding from my side ?

menvol3 avatar Dec 03 '24 16:12 menvol3

See if this matches the way you are thinking of it: The agents runs next to the servers you want to monitor, which are on the backside of the ALB. The proxy runs next to Prometheus, which is on the frontside of the ALB. The agents connect to the proxy via gRPC, not using the ALB machinery. Prometheus makes requests to the proxy and the proxy uses the bidirectional traffic on the agent connection to get the agent to scrape the data and return it. The agent would not use the ALB because there is only one proxy, i.e., there is no balancing going on in that direction.

pambrose avatar Dec 03 '24 16:12 pambrose

The agent -> proxy connection would go counter to the inbound ALB traffic.

pambrose avatar Dec 03 '24 16:12 pambrose

Maybe I am misunderstanding your setup, but given the number of people successfully using the proxy on AWS, I am exploring this avenue of inquiry.

pambrose avatar Dec 03 '24 16:12 pambrose

Here’s my current setup:

  • Prometheus is deployed in an AWS EKS cluster.

  • Prometheus-proxy is also deployed within the same cluster but has its own external endpoint.

  • I have infrastructure in several offices that I want to monitor, but this infrastructure is not directly accessible by Prometheus, each host has some metric exporter and will have prometheus-agent to be able to scrape metrics from that exporter

I added the external IP addresses of these offices to the AWS ALB allowlist so they can access the external endpoint of Prometheus-proxy.

And this setup works with grpcurl, it’s not currently working for the agent

External endpoint listening on 443 and forwarding all requests to the proxy in EKS to port 50051 Ingress is configured as described here https://kubernetes.github.io/ingress-nginx/examples/grpc/

menvol3 avatar Dec 03 '24 21:12 menvol3

Everything you are saying sounds reasonable. I have not use the proxy in an EKS setup, but thousands of people are using gRPC with it, so we are not trying to do the impossible here.

I should probably create a parallel setup to yours on AWS and play with the GRPC_TRACING. Any chance you could sketch out a diagram of your AWS setup? That will allow me to get as close as possible to recreating it.

pambrose avatar Dec 04 '24 19:12 pambrose

Sure, below is a diagram of how it’s deployed in my setup:

  • There is a monitoring namespace inside the EKS cluster where both Prometheus and Prometheus-proxy are deployed.

  • Prometheus-proxy has an ingress object, making it accessible from the internet, but only for IP addresses listed in the access list.

  • Prometheus does not have an ingress object but can access the Prometheus-proxy pod using internal DNS within the cluster, enabling their communication."

prometheus-proxy drawio

menvol3 avatar Dec 05 '24 11:12 menvol3

I did not have time today to recreate your setup, but I did have a new idea. grpcurl defauls to using tls, so I suspect that is why it is working on port 443. It is written in go and apparently go defaults to using a system-level tls cert. The agent defaults to plaintext, which, as you have seen, fails on port 443. I think the solution is to add the tls options on the agent.

The README has some notes on how to setup TLS on the proxy. I am assuming you do not need mutual authentication since grpcurl did not require it on the proxy. The Prometheus proxy repo has some certs that you can use after cloning the repo.

pambrose avatar Dec 06 '24 07:12 pambrose

Yep, you are right, now it works

14:38:15.672 INFO [AgentGrpcService.kt:169] - Connected to proxy at prometheus-proxy.service.net:443 using TLS (no mutual auth) [Agent test-agent-1]

I had to fetch the public certificate from AWS ACM Manager and set the path to it in the agent's property

Also, I didn't set to true TRANSPORT_FILTER_DISABLED arg

Thank You

menvol3 avatar Dec 06 '24 12:12 menvol3

Also, would it be possible for you to add support for trusted public CAs to the agent, so that it can automatically negotiate the connection?

menvol3 avatar Dec 06 '24 12:12 menvol3