hawkular-openshift-agent icon indicating copy to clipboard operation
hawkular-openshift-agent copied to clipboard

Unable to store metrics

Open ctron opened this issue 8 years ago • 5 comments

Running HOSA with oc cluster up --metrics causes the following error to appear in the log file very few seconds when watching the log with oc log -f hawkular-openshift-agent-123456:

W0404 09:25:22.691035       1 metrics_storage.go:164] Failed to store metrics. err=Post https://hawkular-metrics.openshift-infra.svc.cluster.local/hawkular/metrics/gauges/raw: x509: certificate is valid for hawkular-metrics, metrics-openshift-infra.10.32.64.18.xip.io, not hawkular-metrics.openshift-infra.svc.cluster.local
oc v1.4.1+3f9807a
kubernetes v1.4.0+776c994
features: Basic-Auth GSSAPI Kerberos SPNEGO

Server https://10.32.64.18:8443
openshift v1.4.1+3f9807a
kubernetes v1.4.0+776c994

I followed the steps here: [1] (with IMAGE_VERSION=1.4.0.Final) and here [2].

[1] https://github.com/openshift/origin-metrics#deploying-the-hawkular-openshift-agent [2] https://github.com/hawkular/hawkular-openshift-agent/tree/master/examples/jolokia-wildfly-example

ctron avatar Apr 04 '17 09:04 ctron

Known issue and addressed in a newer version of OpenShift.

Here is a quick summary - when people ran with multitenancy (ovs-multitenant), things didn't work (see bugzilla https://bugzilla.redhat.com/show_bug.cgi?id=1421060). So we changed some things to get that to work, which required the agent to be deployed in the "default" project. But because of this move to the "default" project, the agent failed to store metrics to Origin-Metrics due to a bad certificate built for Origin-Metrics (which is the error you are seeing). This certificate was fixed in a later version of OpenShift - please see: https://github.com/openshift/origin-metrics/pull/302 and https://github.com/hawkular/hawkular-openshift-agent/pull/124

The short of it is, if you are running with ovs-multitenant you will need to upgrade OpenShift. If you are not, you can deploy the agent in the same namespace as Origin Metrics (that is, "openshift-infra") and I believe things should work.

jmazzitelli avatar Apr 04 '17 12:04 jmazzitelli

the following trick worked for me : oc process -f /usr/share/openshift/hosted/metrics-deployer.yaml -v HAWKULAR_METRICS_HOSTNAME="hawkular-metrics.openshift-infra.svc.cluster.local,hawkular-metrics.example.com" -v MODE=refresh -v CONTINUE_ON_ERROR=true -v IGNORE_PREFLIGHT=true | oc create -f -

it will refresh all the configuration of openshift-metrics, there is an error while recreating the route, ignore that, it'll then push the new certs to the old route.

ljuaneda avatar Apr 04 '17 16:04 ljuaneda

Thanks @jmazzitelli for the help. However (I forgot to mention that) I actually did deploy this to openshift-infra already.

I am unsure what the following means regarding oc cluster up:

The short of it is, if you are running with ovs-multitenant you will need to upgrade OpenShift. If you are not, you can deploy the agent in the same namespace as Origin Metrics (that is, "openshift-infra") and I believe things should work.

Does oc cluster up run with "ovs-multitenant"?

ctron avatar Apr 05 '17 07:04 ctron

@ctron if you have the agent deployed in openshift-infra, then both the agent and Origin Metrics are in the same project. When this is the case, you don't need to have the agent configured to talk to Origin Metrics via that "bad" hostname of "hawkular-metrics.openshift-infra.svc.cluster.local". Instead configure the agent to use the hostname assigned to the Origin Metrics route. You see this hostname in the OpenShift Console when viewing the openshift-infra overview page (if you have an admin user that has permissions to see the openshift-infra project). It is going to be something like "https://metrics-openshift-infra.#.#.#.#.xip.io/" where #.#.#.# is your external master hostname. Looks like yours might be "http://metrics-openshift-infra.10.32.64.18.xip.io" based on your original error message.

So you need to change your agent's main configuration yaml to use that hostname and not "hawkular-metrics.openshift-infra.svc.cluster.local" - something like:

...
hawkular_server:
  url: http://metrics-openshift-infra.10.32.64.18.xip.io
...

To see where this setting is in our agent's default config, see here: https://github.com/hawkular/hawkular-openshift-agent/blob/master/deploy/openshift/hawkular-openshift-agent-configmap.yaml#L14

BTW: to know if your OpenShift is configured with ovs_multitenant, see these OpenShift docs (in short, you look in your master's yaml config and check its value of the networkConfig/networkPluginName property): https://docs.openshift.com/enterprise/3.1/install_config/configuring_sdn.html

To read about ovs_multitenant and how it is affecting the agent and why, for example, the agent is by default deployed in "default" project, see https://docs.openshift.org/latest/architecture/additional_concepts/sdn.html

jmazzitelli avatar Apr 05 '17 13:04 jmazzitelli

Thanks for the explanation. So that is what I did in the end to work around that. Seems like this is the way to go.

Maybe there should be a note at: https://github.com/openshift/origin-metrics#deploying-the-hawkular-openshift-agent

ctron avatar Apr 05 '17 15:04 ctron