hawkular-openshift-agent icon indicating copy to clipboard operation
hawkular-openshift-agent copied to clipboard

HOSA fails to scrap its own metrics when identity is set in config.yaml

Open ljuaneda opened this issue 8 years ago • 3 comments

Hi,

This is following issue #150

I'm using openshift master-proxy certs in a secret to gather metrics from jolokia endpoints. Currently working with commit 860030232e974c1abd27dc054e3d4ba47c0070a4 on OSCP v3.5.5.15

I'm trying the new docker images hawkular/hawkular-openshift-agent that pulled a version 1.4.2 But HOSA fails to scrap its own metrics on :8443 :

I1025 08:06:11.320386       1 prometheus_metrics_collector.go:97] DEBUG: Told to collect all Prometheus metrics from [https://10.130.5.68:8443/metrics]
2017/10/25 08:06:11 http: TLS handshake error from 10.130.5.68:42456: read tcp 10.130.5.68:8443->10.130.5.68:42456: read: connection reset by peer
W1025 08:06:11.324820       1 metrics_collector_manager.go:186] Failed to collect metrics from [default|hawkular-openshift-agent-8ffjs|prometheus|https://10.130.5.68:8443/metrics] at [Wed, 25 Oct 2017 08:06:11 +0000]. err=Failed to collect Prometheus metrics from [https://10.130.5.68:8443/metrics]. err=Cannot scrape Prometheus URL [https://10.130.5.68:8443/metrics]: err=Get https://10.130.5.68:8443/metrics: x509: cannot validate certificate for 10.130.5.68 because it doesn't contain any IP SANs

My guess is that HOSA is not expecting unsecured connections :

$ oc exec hawkular-openshift-agent-8ffjs -- curl -vks https://10.130.5.68:8443/metrics
* About to connect() to 10.130.5.68 port 8443 (#0)
*   Trying 10.130.5.68...
* Connected to 10.130.5.68 (10.130.5.68) port 8443 (#0)
* Initializing NSS with certpath: sql:/etc/pki/nssdb
* skipping SSL peer certificate verification
* SSL connection using TLS_ECDHE_RSA_WITH_AES_128_GCM_SHA256
* Server certificate:
*       subject: CN=system:master-proxy
*       start date: Feb 01 16:54:03 2017 GMT
*       expire date: Feb 01 16:54:04 2019 GMT
*       common name: system:master-proxy
*       issuer: CN=openshift-signer@1485968044
> GET /metrics HTTP/1.1
> User-Agent: curl/7.29.0
> Host: 10.130.5.68:8443
> Accept: */*
>
< HTTP/1.1 200 OK
< Content-Length: 6308
< Content-Type: text/plain; version=0.0.4
< Date: Wed, 25 Oct 2017 08:15:28 GMT
<
{ [data not shown]
* Connection #0 to host 10.130.5.68 left intact
# HELP go_gc_duration_seconds A summary of the GC invocation durations.
# TYPE go_gc_duration_seconds summary
go_gc_duration_seconds{quantile="0"} 0.00022610500000000002
go_gc_duration_seconds{quantile="0.25"} 0.00024556400000000004
go_gc_duration_seconds{quantile="0.5"} 0.000258036
go_gc_duration_seconds{quantile="0.75"} 0.000269366
go_gc_duration_seconds{quantile="1"} 0.000530231
go_gc_duration_seconds_sum 0.004801416
go_gc_duration_seconds_count 17
...

My current configuration :

$ cat hawkular-openshift-agent-configuration.cm-new.yaml
apiVersion: v1
kind: List
metadata: {}
items:
- apiVersion: v1
  kind: ConfigMap
  metadata:
    labels:
      metrics-infra: agent
    name: hawkular-openshift-agent-configuration
    namespace: default
  data:
    config.yaml: |
      kubernetes:
        tenant: ${POD:namespace_name}
      hawkular_server:
        url: https://hawkular-metrics.openshift-infra.svc.cluster.local
        credentials:
          username: secret:openshift-infra/hawkular-metrics-account/hawkular-metrics.username
          password: secret:openshift-infra/hawkular-metrics-account/hawkular-metrics.password
        ca_cert_file: secret:openshift-infra/hawkular-metrics-certificate/hawkular-metrics-ca.certificate
      emitter:
        status_enabled: true
        metrics_enabled: true
        health_enabled: true
      identity:
        cert_file: /master-proxy/master.proxy-client.crt
        private_key_file: /master-proxy/master.proxy-client.key
      collector:
        max_metrics_per_pod: 500
        minimum_collection_interval: 10s
        default_collection_interval: 30s
        metric_id_prefix: pod/${POD:uid}/custom/
        pod_label_tags_prefix: _empty_
        tags:
          metric_name: ${METRIC:name}
          description: ${METRIC:description}
          units: ${METRIC:units}
          namespace_id: ${POD:namespace_uid}
          namespace_name: ${POD:namespace_name}
          node_name: ${POD:node_name}
          pod_id: ${POD:uid}
          pod_ip: ${POD:ip}
          pod_name: ${POD:name}
          pod_namespace: ${POD:namespace_name}
          hostname: ${POD:hostname}
          host_ip: ${POD:host_ip}
          labels: ${POD:labels}
          cluster_name: ${POD:cluster_name}
          resource_version: ${POD:resource_version}
          type: pod
          collector: hawkular_openshift_agent
          custom_metric: true
    hawkular-openshift-agent: |
      endpoints:
      - type: prometheus
        protocol: "https"
        port: 8443
        path: /metrics
        collection_interval: 30s
- apiVersion: extensions/v1beta1
  kind: DaemonSet
  metadata:
    creationTimestamp: null
    labels:
      metrics-infra: agent
      name: hawkular-openshift-agent
    name: hawkular-openshift-agent
  spec:
    selector:
      matchLabels:
        name: hawkular-openshift-agent
    template:
      metadata:
        creationTimestamp: null
        labels:
          metrics-infra: agent
          name: hawkular-openshift-agent
      spec:
        containers:
        - command:
          - /opt/hawkular/hawkular-openshift-agent
          - -config
          - /hawkular-openshift-agent-configuration/config.yaml
          - -v
          - "4"
          env:
          - name: K8S_POD_NAMESPACE
            valueFrom:
              fieldRef:
                apiVersion: v1
                fieldPath: metadata.namespace
          - name: K8S_POD_NAME
            valueFrom:
              fieldRef:
                apiVersion: v1
                fieldPath: metadata.name
          - name: EMITTER_STATUS_CREDENTIALS_USERNAME
            valueFrom:
              secretKeyRef:
                key: username
                name: hawkular-openshift-agent-status
          - name: EMITTER_STATUS_CREDENTIALS_PASSWORD
            valueFrom:
              secretKeyRef:
                key: password
                name: hawkular-openshift-agent-status
          image: hawkular/hawkular-openshift-agent:1.4.2
          imagePullPolicy: Always
          livenessProbe:
            failureThreshold: 3
            httpGet:
              path: /health
              port: 8443
              scheme: HTTPS
            initialDelaySeconds: 30
            periodSeconds: 30
            successThreshold: 1
            timeoutSeconds: 1
          name: hawkular-openshift-agent
          resources: {}
          terminationMessagePath: /dev/termination-log
          volumeMounts:
          - mountPath: /hawkular-openshift-agent-configuration
            name: hawkular-openshift-agent-configuration
          - mountPath: /master-proxy
            name: master-proxy
        dnsPolicy: ClusterFirst
        nodeSelector:
          hawkular-openshift-agent: "true"
        restartPolicy: Always
        securityContext: {}
        serviceAccount: hawkular-openshift-agent
        serviceAccountName: hawkular-openshift-agent
        terminationGracePeriodSeconds: 30
        volumes:
        - configMap:
            defaultMode: 420
            name: hawkular-openshift-agent-configuration
          name: hawkular-openshift-agent-configuration
        - configMap:
            defaultMode: 420
            name: hawkular-openshift-agent-configuration
          name: hawkular-openshift-agent
        - name: master-proxy
          secret:
            defaultMode: 420
            secretName: master-proxy

Regards,

Ludovic

ljuaneda avatar Oct 25 '17 08:10 ljuaneda

By the way, it doesn't seems to bother the liveness probe

$ oc get pods hawkular-openshift-agent-8ffjs
NAME                             READY     STATUS    RESTARTS   AGE
hawkular-openshift-agent-8ffjs   1/1       Running   0          17m

ljuaneda avatar Oct 25 '17 08:10 ljuaneda

Check this commit - shows a change in the ca_cert_file setting that you may also have to incorporate:

https://github.com/hawkular/hawkular-openshift-agent/commit/7c7d7f56d614dde520e3fc44f88f4335f801ca7f

jmazzitelli avatar Oct 25 '17 11:10 jmazzitelli

This doesn't work with OSCP version 3.5 :

$ oc -n openshift-infra get secret | grep hawkular-metrics
hawkular-metrics-account                           Opaque                                2         186d
hawkular-metrics-certificate                       Opaque                                2         186d
hawkular-metrics-secrets                           Opaque                                9         186d
$ oc -n openshift-infra get secret hawkular-metrics-certificate -o json | jq -r '.data|keys[]'
hawkular-metrics-ca.certificate
hawkular-metrics.certificate

This related to openshift/origin-metrics. Unfortunatly, this is for origin-metrics 3.6 or later, there is no backport for origin-metrics 3.5.

ljuaneda avatar Oct 25 '17 14:10 ljuaneda