opentelemetry-java-instrumentation icon indicating copy to clipboard operation
opentelemetry-java-instrumentation copied to clipboard

The type of kafka_producer_connection_count keeps changing between counter and gauge

Open tuhao1020 opened this issue 3 years ago • 12 comments

OpenTelemetry java agent version: 1.20.2 Kafka version: 3.1.1 OpenTelemetry Collector version: 0.66.0

OpenTelemetry Collector config file:

receivers:
  otlp:
    protocols:
      grpc:
        endpoint: 0.0.0.0:14317
  otlp/dummy: # Dummy receiver for the metrics pipeline
    protocols:
      grpc:
        endpoint: localhost:65535

processors:
  servicegraph:
    metrics_exporter: prometheus/servicegraph # Exporter to send metrics to
    dimensions: [cluster, namespace] # Additional dimensions (labels) to be added to the metrics extracted from the resource and span attributes
    store: # Configuration for the in-memory store
      ttl: 2s # Value to wait for an edge to be completed
      max_items: 200 # Amount of edges that will be stored in the storeMap      

exporters:
  prometheus/servicegraph:
    endpoint: 0.0.0.0:9091  # to prometheus
  otlp:
    endpoint: http://localhost:4317  # to jaeger
    tls:
      insecure: true 
  logging:
    logLevel: debug    

service:
  pipelines:
    traces:
      receivers: [otlp]
      processors: [servicegraph]
      exporters: [logging, otlp]
    metrics/servicegraph:
      receivers: [otlp]
      processors: []
      exporters: [prometheus/servicegraph]

refresh http://localhost:9091/metrics in browser, I find that kafka_producer_connection_count keeps changing between counter and gauge

# HELP kafka_producer_connection_count The current number of active connections.
# TYPE kafka_producer_connection_count counter
kafka_producer_connection_count{client_id="producer-1",job="otel-demo-provider",kafka_version="3.1.1",spring_id="kafkaProducerFactory.producer-1"} 1
# HELP kafka_producer_connection_count The current number of active connections.
# TYPE kafka_producer_connection_count gauge
kafka_producer_connection_count{client_id="producer-1",job="otel-demo-provider"} 1

tuhao1020 avatar Nov 24 '22 08:11 tuhao1020

Hey @tuhao1020 , What kind of metrics does the javaagent export? Excluding the collector? Let's make sure there's no interference on the collector side first.

mateuszrzeszutek avatar Nov 24 '22 10:11 mateuszrzeszutek

@mateuszrzeszutek The metrics exported by the Java agent always keep the gauge type, you mean the collector modified the type? Theoretically, collector does not modify this type, right?

tuhao1020 avatar Nov 24 '22 11:11 tuhao1020

Honestly, I've no idea if the collector modifies it or not - which is why we should first try to pinpoint which of these two (agent, collector) causes this to happen.

mateuszrzeszutek avatar Nov 24 '22 13:11 mateuszrzeszutek

@mateuszrzeszutek #7271 Does it have anything to do with this? I'm using kafka 3.3.1, but kafka_version of the metrics are 3.1.1

tuhao1020 avatar Nov 25 '22 03:11 tuhao1020

No, that PR is about Spring Kafka, it has a different versioning scheme from Kafka.

mateuszrzeszutek avatar Nov 25 '22 10:11 mateuszrzeszutek

Same problem.

jojotong avatar Dec 09 '22 07:12 jojotong

Hi @mateuszrzeszutek I still reproduce this error in a simple Spring-kafka demo. (https://github.com/MaheshIare/spring-boot-kafka-demo/tree/master?tab=readme-ov-file)

Any ideas on how to troubleshoot this?


Java agent version : 1.24.0 Using the configuration below: Java Env config:

OTEL_EXPORTER_OTLP_METRICS_ENDPOINT=http://localhost:14318;OTEL_EXPORTER_PROMETHEUS_PORT=10000;OTEL_METRICS_EXPORTER=otlp;OTEL_SERVICE_NAME=test-kfk

CLI arguments:

-Dotel.instrumentation.runtime-metrics.experimental-metrics.enabled=true

VM:

-javaagent:/Users/yuan/Dev/IdeaProjects/otel-java-instrumentation/alauda-extension/build/libs/opentelemetry-javaagent-ext.jar

OTel-Collector version: 0.100.0 Config:

extensions:
# The health_check extension is mandatory for this chart.
# Without the health_check extension the collector will fail the readiness and liveliness probes.
# The health_check extension can be modified, but should never be removed.
  health_check: {}
  memory_ballast:
    size_in_percentage: 40
receivers:
  otlp/traces:
    protocols:
      grpc:
        endpoint: :14317
  otlp/metrics:
    protocols:
      grpc:
        endpoint: :14318
  zipkin:

exporters:
  logging:
    loglevel: info
  otlp/metrics:
    endpoint: :14318
    tls:
      insecure: true
  prometheus:
    endpoint: :8889
service:
  extensions:
    - health_check
    - memory_ballast
  telemetry:
    logs:
      level: info
    metrics:
      level: detailed
      address: :8888
  pipelines:
    metrics:
      receivers: [otlp/metrics]
      exporters: [prometheus]

OTel Collector log as follows:

2024-05-29T18:44:55.442+0800    error   [email protected]/log.go:23    error gathering metrics: collected metric kafka_consumer_connection_count label:{name:"client_id"  value:"consumer-c92f3eab-2f4f-4e96-a394-1983d69e24ae-0"}  label:{name:"job"  value:"test-kfk"}  gauge:{value:2} should be a Counter
        {"kind": "exporter", "data_type": "metrics", "name": "prometheus"}
github.com/open-telemetry/opentelemetry-collector-contrib/exporter/prometheusexporter.(*promLogger).Println
        github.com/open-telemetry/opentelemetry-collector-contrib/exporter/[email protected]/log.go:23
github.com/prometheus/client_golang/prometheus/promhttp.HandlerForTransactional.func1
        github.com/prometheus/[email protected]/prometheus/promhttp/http.go:144
net/http.HandlerFunc.ServeHTTP
        net/http/server.go:2122
net/http.(*ServeMux).ServeHTTP
        net/http/server.go:2500
go.opentelemetry.io/collector/config/confighttp.(*decompressor).ServeHTTP
        go.opentelemetry.io/collector/config/[email protected]/compression.go:147
go.opentelemetry.io/contrib/instrumentation/net/http/otelhttp.(*Handler).ServeHTTP
        go.opentelemetry.io/contrib/instrumentation/net/http/[email protected]/handler.go:212
go.opentelemetry.io/collector/config/confighttp.(*clientInfoHandler).ServeHTTP
        go.opentelemetry.io/collector/config/[email protected]/clientinfohandler.go:28
net/http.serverHandler.ServeHTTP
        net/http/server.go:2936
net/http.(*conn).serve
        net/http/server.go:1995
2024-05-29T18:44:55.443+0800    error   [email protected]/log.go:23    error gathering metrics: collected metric kafka_consumer_connection_count label:{name:"client_id"  value:"consumer-c92f3eab-2f4f-4e96-a394-1983d69e24ae-0"}  label:{name:"job"  value:"test-kfk"}  label:{name:"kafka_version"  value:"2.6.0"}  label:{name:"spring_id"  value:"kafkaConsumerFactory.consumer-c92f3eab-2f4f-4e96-a394-1983d69e24ae-0"}  counter:{value:2} should be a Gauge
        {"kind": "exporter", "data_type": "metrics", "name": "prometheus"}
github.com/open-telemetry/opentelemetry-collector-contrib/exporter/prometheusexporter.(*promLogger).Println
        github.com/open-telemetry/opentelemetry-collector-contrib/exporter/[email protected]/log.go:23
github.com/prometheus/client_golang/prometheus/promhttp.HandlerForTransactional.func1
        github.com/prometheus/[email protected]/prometheus/promhttp/http.go:144
net/http.HandlerFunc.ServeHTTP
        net/http/server.go:2122
net/http.(*ServeMux).ServeHTTP
        net/http/server.go:2500

fyuan1316 avatar May 29 '24 10:05 fyuan1316

Is there anyone can fix it? I also have same issue only with kafka metric.

quanbisen avatar Sep 12 '24 06:09 quanbisen

Which version of the agent do you use? @quanbisen I resolved this issue by upgrading to a higher 1.x version.

fyuan1316 avatar Sep 12 '24 13:09 fyuan1316

Which version of the agent do you use? @quanbisen I resolved this issue by upgrading to a higher 1.x version.

I use opentelemetry-javaagent - version: 2.7.0.

quanbisen avatar Sep 16 '24 14:09 quanbisen

Which version of the agent do you use? @quanbisen I resolved this issue by upgrading to a higher 1.x version.

and my collector partial output log as below:

* collected metric kafka_producer_byte_total label:{name:"client_id" value:"producer-1"} label:{name:"host_arch" value:"amd64"} label:{name:"host_name" value:"test01-01"} label:{name:"instance" value:"e16e5e18-3d97-494b-b453-599114fd40fb"} label:{name:"job" value:"lebo-desk"} label:{name:"os_description" value:"Linux 3.10.0-1160.88.1.el7.x86_64"} label:{name:"os_type" value:"linux"} label:{name:"process_command_line" value:"/usr/java/jdk1.8.0_101/jre/bin/java -javaagent:../opentelemetry-agent/opentelemetry-javaagent.jar -Dotel.service.name=lebo-desk -Dotel.exporter.otlp.endpoint=http://10.0.8.48:4318 -Dspring.cloud.nacos.discovery.server-addr=nacos.lebo.lc:80 -Dspring.cloud.nacos.config.server-addr=nacos.lebo.lc:80 -Xms1000m -Xmx1000m lebo-desk.jar --spring.profiles.active=test"} label:{name:"process_executable_path" value:"/usr/java/jdk1.8.0_101/jre/bin/java"} label:{name:"process_pid" value:"2900360"} label:{name:"process_runtime_description" value:"Oracle Corporation Java HotSpot(TM) 64-Bit Server VM 25.101-b13"} label:{name:"process_runtime_name" value:"Java(TM) SE Runtime Environment"} label:{name:"process_runtime_version" value:"1.8.0_101-b13"} label:{name:"service_instance_id" value:"e16e5e18-3d97-494b-b453-599114fd40fb"} label:{name:"service_name" value:"lebo-desk"} label:{name:"telemetry_distro_name" value:"opentelemetry-java-instrumentation"} label:{name:"telemetry_distro_version" value:"2.7.0"} label:{name:"telemetry_sdk_language" value:"java"} label:{name:"telemetry_sdk_name" value:"opentelemetry"} label:{name:"telemetry_sdk_version" value:"1.41.0"} label:{name:"topic" value:"lebocloud_campaign_push"} gauge:{value:68528} should be a Counter
* collected metric kafka_producer_response_total label:{name:"client_id" value:"producer-1"} label:{name:"host_arch" value:"amd64"} label:{name:"host_name" value:"test01-01"} label:{name:"instance" value:"3853ea0d-7c32-43fe-9076-535398225ec2"} label:{name:"job" value:"vipauth-out"} label:{name:"node_id" value:"node-0"} label:{name:"os_description" value:"Linux 3.10.0-1160.88.1.el7.x86_64"} label:{name:"os_type" value:"linux"} label:{name:"process_command_line" value:"/usr/java/jdk1.8.0_101/jre/bin/java -javaagent:../opentelemetry-agent/opentelemetry-javaagent.jar -Dotel.service.name=vipauth-out -Dotel.exporter.otlp.endpoint=http://10.0.8.48:4318 -Dlog4j2.formatMsgNoLookups=true -Xms1024m -Xmx1024m VipAuth-out.jar --spring.profiles.active=prd"} label:{name:"process_executable_path" value:"/usr/java/jdk1.8.0_101/jre/bin/java"} label:{name:"process_pid" value:"1433623"} label:{name:"process_runtime_description" value:"Oracle Corporation Java HotSpot(TM) 64-Bit Server VM 25.101-b13"} label:{name:"process_runtime_name" value:"Java(TM) SE Runtime Environment"} label:{name:"process_runtime_version" value:"1.8.0_101-b13"} label:{name:"service_instance_id" value:"3853ea0d-7c32-43fe-9076-535398225ec2"} label:{name:"service_name" value:"vipauth-out"} label:{name:"service_version" value:"0.0.1-SNAPSHOT"} label:{name:"telemetry_distro_name" value:"opentelemetry-java-instrumentation"} label:{name:"telemetry_distro_version" value:"2.7.0"} label:{name:"telemetry_sdk_language" value:"java"} label:{name:"telemetry_sdk_name" value:"opentelemetry"} label:{name:"telemetry_sdk_version" value:"1.41.0"} counter:{value:24163} should be a Gauge
* collected metric kafka_producer_successful_authentication_total label:{name:"client_id" value:"producer-1"} label:{name:"host_arch" value:"amd64"} label:{name:"host_name" value:"test01-01"} label:{name:"instance" value:"99db74e7-5361-4a4d-97c2-5627f5d712a1"} label:{name:"job" value:"user-service-boot"} label:{name:"os_description" value:"Linux 3.10.0-1160.88.1.el7.x86_64"} label:{name:"os_type" value:"linux"} label:{name:"process_command_line" value:"/usr/java/jdk1.8.0_101/jre/bin/java -javaagent:../opentelemetry-agent/opentelemetry-javaagent.jar -Dotel.config.file=otel-config.properties -Dotel.service.name=user-service-boot -Dotel.exporter.otlp.endpoint=http://10.0.8.48:4318 -Xms256m -Xmx712m -jar user-service-boot-1.0.0.jar"} label:{name:"process_executable_path" value:"/usr/java/jdk1.8.0_101/jre/bin/java"} label:{name:"process_pid" value:"2221142"} label:{name:"process_runtime_description" value:"Oracle Corporation Java HotSpot(TM) 64-Bit Server VM 25.101-b13"} label:{name:"process_runtime_name" value:"Java(TM) SE Runtime Environment"} label:{name:"process_runtime_version" value:"1.8.0_101-b13"} label:{name:"service_instance_id" value:"99db74e7-5361-4a4d-97c2-5627f5d712a1"} label:{name:"service_name" value:"user-service-boot"} label:{name:"service_version" value:"1.0.0"} label:{name:"telemetry_distro_name" value:"opentelemetry-java-instrumentation"} label:{name:"telemetry_distro_version" value:"2.7.0"} label:{name:"telemetry_sdk_language" value:"java"} label:{name:"telemetry_sdk_name" value:"opentelemetry"} label:{name:"telemetry_sdk_version" value:"1.41.0"} counter:{value:0} should be a Gauge
        {"kind": "exporter", "data_type": "metrics", "name": "prometheus"}
github.com/open-telemetry/opentelemetry-collector-contrib/exporter/prometheusexporter.(*promLogger).Println
        github.com/open-telemetry/opentelemetry-collector-contrib/exporter/[email protected]/log.go:23
github.com/prometheus/client_golang/prometheus/promhttp.HandlerForTransactional.func1
        github.com/prometheus/[email protected]/prometheus/promhttp/http.go:144
net/http.HandlerFunc.ServeHTTP
        net/http/server.go:2136
net/http.(*ServeMux).ServeHTTP
        net/http/server.go:2514
go.opentelemetry.io/collector/config/confighttp.(*decompressor).ServeHTTP
        go.opentelemetry.io/collector/config/[email protected]/compression.go:147
go.opentelemetry.io/contrib/instrumentation/net/http/otelhttp.(*middleware).serveHTTP
        go.opentelemetry.io/contrib/instrumentation/net/http/[email protected]/handler.go:229
go.opentelemetry.io/contrib/instrumentation/net/http/otelhttp.NewMiddleware.func1.1
        go.opentelemetry.io/contrib/instrumentation/net/http/[email protected]/handler.go:81
net/http.HandlerFunc.ServeHTTP
        net/http/server.go:2136
go.opentelemetry.io/collector/config/confighttp.(*clientInfoHandler).ServeHTTP
        go.opentelemetry.io/collector/config/[email protected]/clientinfohandler.go:28
net/http.serverHandler.ServeHTTP
        net/http/server.go:2938
net/http.(*conn).serve
        net/http/server.go:2009

quanbisen avatar Sep 19 '24 01:09 quanbisen

hi @quanbisen, I think we would need a minimal sample app that reproduce the issue in order to understand what's going on.

trask avatar Dec 23 '24 18:12 trask

This issue has been labeled as stale due to lack of activity and needing author feedback. It will be automatically closed if there is no further activity over the next 7 days.

github-actions[bot] avatar Oct 10 '25 03:10 github-actions[bot]