prometheus-proxy with Ingress-Nginx
Hello
I'm working on deploying Prometheus-proxy within the AWS EKS cluster and making it accessible to agents via NGINX ingress.
I’ve configured NGINX to forward incoming traffic to the backend on port 50051, which is used for gRPC.
Rules:
Host Path Backends
---- ---- --------
prometheus-proxy.service.net
/ prometheus-proxy:50051 (10.10.51.143:50051)
Annotations: nginx.ingress.kubernetes.io/backend-protocol: GRPC
nginx.ingress.kubernetes.io/ssl-redirect: true
Also my service configuration
Type: ClusterIP
Port: grpc 50051/TCP
TargetPort: 50051/TCP
Endpoints: 10.10.51.143:50051
The Proxy started with the default configuration, and the agent started with the next args
java -jar prometheus-agent.jar -Dagent.proxy.hostname=prometheus-proxy.service.net:443 --config config.conf
My issue is that the agent is unable to connect to the proxy. There are no logs in the proxy indicating any connection attempts, and I also can't see any related logs in my NGINX pod about the agent's connection.
Agent log is hanging on connecting
15:44:12.103 INFO [AgentGrpcService.kt:175] - Connecting to proxy at prometheus-proxy.service.net:443 using plaintext...
However, when I try to access the proxy using the grpcurl tool with the following command
grpcurl -insecure prometheus-proxy.service.net:443 list
I receive the following output:
Failed to list services: server does not support the reflection API
Additionally, some logs appear in the ingress controller."
145.245.130.26 - - [29/Nov/2024:14:56:45 +0000] "POST /grpc.reflection.v1.ServerReflection/ServerReflectionInfo HTTP/2.0" 200 0 "-" "grpcurl/1.9.1 grpc-go/1.61.0" 157 0.049 [prometheus-proxy-50051] [] 10.10.51.143:50051 0 0.049 200 4a75fd48134f8d5b11939074f090ad4f
145.245.130.26 - - [29/Nov/2024:14:56:45 +0000] "POST /grpc.reflection.v1alpha.ServerReflection/ServerReflectionInfo HTTP/2.0" 200 0 "-" "grpcurl/1.9.1 grpc-go/1.61.0" 62 0.007 [prometheus-proxy-50051] [] 10.10.51.143:50051 0 0.007 200 411fe7fc6bbe7c37b2c445d56874353a
145.245.130.26 - - [29/Nov/2024:14:56:46 +0000] "POST /grpc.reflection.v1alpha.ServerReflection/ServerReflectionInfo HTTP/2.0" 200 0 "-" "grpcurl/1.9.1 grpc-go/1.61.0" 16 0.007 [prometheus-proxy-50051] [] 10.10.51.143:50051 0 0.007 200 55dd037169407a9b404159bf4ff194f7
Also, I tried to use TRANSPORT_FILTER_DISABLED for both Proxy and Agent - didn't help
Could someone please point out what I may have missed or what might be causing the issue?
Hi @menvol3
It is odd that you are hanging and not getting an exception when connecting. It is hanging in a stub, so we are at the mercy of gRPC.
How about if I enable the reflection API and then you can see if grpcurl works? Perhaps grpcurl will provide a clue on what the issue us.
Paul
I added support for gRPC Reflection. Let's see what grpcurl reports for your setup. I will push the release out a little later today.
I pushed the 1.23.0 release. Let's see what gprcurl has to say.
If you you use the grpcurl -plaintext option, make sure you comment out any tls {} config info on the proxy.
Hello @pambrose
I've tried grpcurl with the latest versions of the proxy and client, and it seems to be working now
grpcurl -d '{}' prometheus-proxy.service.net:443 list
And output was
Warning: The -d argument is not used with 'list' or 'describe' verb.
ProxyService
grpc.reflection.v1.ServerReflection
However, the client is still hanging with the same status
INFO [AgentGrpcService.kt:175] - Connecting to proxy at prometheus-proxy.service.net:443 using plaintext...
Interesting. How about commenting out any tls settings on the proxy and try grpcurl with the -plaintext option.
Also, it is curious that your log info says you are using port 443 with plaintext. 443 is usually assicuated with https traffic, which is not plaintext.
Do you have tls settings in your proxy or agent config files?
Interesting. How about commenting out any tls settings on the proxy and try grpcurl with the -plaintext option.
Failed to dial target host "prometheus-proxy.service.net:443": context deadline exceeded
Do you have tls settings in your proxy or agent config files?
I only have an SSL certificate via AWS ACM. When the client makes a request, it goes to the AWS ALB. The ALB terminates the HTTPS connection and forwards the data to the ingress, which then sends the data to the proxy.
The hope was to comment out all thing tls-related in your config on both ends and see if it works with plaintext on a non-HTTPS port. If that works, then we can conclude the issue is with your TLS setup. If the ALB is forcing you to HTTPS, then that presents a problem. But if grpcurl can connect plaintext, then the agent should be able to connect via plaintext as well, assuming the ALB is invovled in both cases. The grpcurl client is the same as the agent code.
If the ALB is forcing you to HTTPS, then that presents a problem.
Unfortunately, that's the case in my setup—HTTPS is configured on the ALB side, so I can't disable it for separate ingress object
Then I am wondering how you were able to connect with grpcurl. If grpcurl worked, then the agent should work with the same tls settings. They both use the same grpc libraries.
Yeah, it is really strange. Below is the output from the ingress controller after I made four requests using grpcurl
145.245.130.26 - - [03/Dec/2024:16:12:00 +0000] "POST /grpc.reflection.v1.ServerReflection/ServerReflectionInfo HTTP/2.0" 200 67 "-" "grpcurl/1.9.1 grpc-go/1.61.0" 157 0.209 [prometheus-proxy-50051] [] 10.10.51.143:50051 100 0.208 200 815d320c5af57fdbcb14ff5b80adebf7
145.245.130.26 - - [03/Dec/2024:16:13:03 +0000] "POST /grpc.reflection.v1.ServerReflection/ServerReflectionInfo HTTP/2.0" 200 34 "-" "grpcurl/1.9.1 grpc-go/1.61.0" 158 0.190 [prometheus-proxy-50051] [] 10.10.51.143:50051 67 0.190 200 5e6e508703486783057ff7bb8768ad1b
145.245.130.26 - - [03/Dec/2024:16:13:09 +0000] "POST /grpc.reflection.v1.ServerReflection/ServerReflectionInfo HTTP/2.0" 200 67 "-" "grpcurl/1.9.1 grpc-go/1.61.0" 157 0.189 [prometheus-proxy-50051] [] 10.10.51.143:50051 100 0.189 200 884c7af09decfb1b94262aedd44945c2
145.245.130.26 - - [03/Dec/2024:16:13:24 +0000] "POST /grpc.reflection.v1.ServerReflection/ServerReflectionInfo HTTP/2.0" 200 67 "-" "grpcurl/1.9.1 grpc-go/1.61.0" 157 0.185 [prometheus-proxy-50051] [] 10.10.51.143:50051 100 0.186 200 45f6d6b3acb6b35f1bd8726685f49d95
However, at the same time, the proxy-agent still hangs on
16:11:31.176 INFO [AgentGrpcService.kt:163] - Connecting to proxy at prometheus-proxy.service.net:443 using plaintext... and nothing about it in ingress logs
What arguments did you use with grpcurl?
No one, just grpcurl prometheus-proxy.service.net:443 list
Am I understanding this correctly? At this moment, for the connection between the agent and the proxy, we only need to expose the gRPC port, which by default is 50051? In my case, the request first comes into the ALB on port 443, where HTTPS is terminated, and then the traffic is forwarded to the proxy
Yes, only one port is required and I think 50051 is the default. The proxy log file reports which port is being used.
Is the ALB forwarding to port 50051?
That feels like where the problem is. But again, if the ALB is forwarding properly for grpcurl, it should be doing the same for the agent.
Hang on, dumb question here: why is the proxy on the backside of the ALB? The agent should be on the backside and it calls out to the proxy, which is not behind the ALB. The agent connects from the inside to the outside of the ALB, which is where a proxy is.
Do you know what I am saying?
Is the ALB forwarding to port 50051?
ALB terminates TLS (HTTPS on port 443) and forwards traffic to the Ingress (plain HTTP/2). The Ingress Controller then forwards the traffic to gRPC service on port 50051.
But again, if the ALB is forwarding properly for grpcurl, it should be doing the same for the agent.
Yeah, looks like requests via grpcurl are working normally, despite this the agent can't initiate connection to proxy for some reason
Do you know what I am saying?
In my setup, the agent is located on the same host that I want to monitor with Prometheus. However, it is not directly accessible to Prometheus. Therefore, we are using an agent that employs a push model, allowing it to connect to the proxy, which is accessible to the agent (e.g., through an allowlist). Is it correct understanding from my side ?
See if this matches the way you are thinking of it: The agents runs next to the servers you want to monitor, which are on the backside of the ALB. The proxy runs next to Prometheus, which is on the frontside of the ALB. The agents connect to the proxy via gRPC, not using the ALB machinery. Prometheus makes requests to the proxy and the proxy uses the bidirectional traffic on the agent connection to get the agent to scrape the data and return it. The agent would not use the ALB because there is only one proxy, i.e., there is no balancing going on in that direction.
The agent -> proxy connection would go counter to the inbound ALB traffic.
Maybe I am misunderstanding your setup, but given the number of people successfully using the proxy on AWS, I am exploring this avenue of inquiry.
Here’s my current setup:
-
Prometheus is deployed in an AWS EKS cluster.
-
Prometheus-proxy is also deployed within the same cluster but has its own external endpoint.
-
I have infrastructure in several offices that I want to monitor, but this infrastructure is not directly accessible by Prometheus, each host has some metric exporter and will have prometheus-agent to be able to scrape metrics from that exporter
I added the external IP addresses of these offices to the AWS ALB allowlist so they can access the external endpoint of Prometheus-proxy.
And this setup works with grpcurl, it’s not currently working for the agent
External endpoint listening on 443 and forwarding all requests to the proxy in EKS to port 50051 Ingress is configured as described here https://kubernetes.github.io/ingress-nginx/examples/grpc/
Everything you are saying sounds reasonable. I have not use the proxy in an EKS setup, but thousands of people are using gRPC with it, so we are not trying to do the impossible here.
I should probably create a parallel setup to yours on AWS and play with the GRPC_TRACING. Any chance you could sketch out a diagram of your AWS setup? That will allow me to get as close as possible to recreating it.
Sure, below is a diagram of how it’s deployed in my setup:
-
There is a monitoring namespace inside the EKS cluster where both Prometheus and Prometheus-proxy are deployed.
-
Prometheus-proxy has an ingress object, making it accessible from the internet, but only for IP addresses listed in the access list.
-
Prometheus does not have an ingress object but can access the Prometheus-proxy pod using internal DNS within the cluster, enabling their communication."
I did not have time today to recreate your setup, but I did have a new idea. grpcurl defauls to using tls, so I suspect that is why it is working on port 443. It is written in go and apparently go defaults to using a system-level tls cert. The agent defaults to plaintext, which, as you have seen, fails on port 443. I think the solution is to add the tls options on the agent.
The README has some notes on how to setup TLS on the proxy. I am assuming you do not need mutual authentication since grpcurl did not require it on the proxy. The Prometheus proxy repo has some certs that you can use after cloning the repo.
Yep, you are right, now it works
14:38:15.672 INFO [AgentGrpcService.kt:169] - Connected to proxy at prometheus-proxy.service.net:443 using TLS (no mutual auth) [Agent test-agent-1]
I had to fetch the public certificate from AWS ACM Manager and set the path to it in the agent's property
Also, I didn't set to true TRANSPORT_FILTER_DISABLED arg
Thank You
Also, would it be possible for you to add support for trusted public CAs to the agent, so that it can automatically negotiate the connection?