opencensus-node icon indicating copy to clipboard operation
opencensus-node copied to clipboard

exporter/stackdriver: not auto detecting GKE labels

Open fiws opened this issue 6 years ago • 4 comments

What version of OpenCensus are you using?

    "@opencensus/core": "0.0.19",
    "@opencensus/exporter-stackdriver": "0.0.19",

What version of Node are you using?

12.14.1

What did you do?

We have a simple setInterval that monitors the nodejs memory usage. (basically a hello world example)

(this is a hapi.js plugin)

const { globalStats, MeasureUnit, AggregationType } = require('@opencensus/core');
const { StackdriverStatsExporter } = require('@opencensus/exporter-stackdriver');

const EXPORT_INTERVAL = process.env.EXPORT_INTERVAL || 60;

const MEMORY_RSS = globalStats.createMeasureInt64(
  'memory_rss',
  MeasureUnit.BYTE,
  'Total memory used'
);

globalStats.registerView(globalStats.createView(
  'nodejs_memory_rss',
  MEMORY_RSS,
  AggregationType.LAST_VALUE,
  [],
  'Total memory used by this process',
));

exports.register = async () => {
  const projectId = 'hardcoded-project-name-here';

  const exporter = new StackdriverStatsExporter({
    projectId,
    period: EXPORT_INTERVAL * 1000,
  });

  // Pass the created exporter to Stats
  globalStats.registerExporter(exporter);

  setInterval(() => {
    const memoryStats = process.memoryUsage();
    globalStats.record([{
      measure: MEMORY_RSS,
      value: memoryStats.rss,
    }]);
  }, 10000);
};

exports.name = 'monitor';

What did you expect to see?

The stats should automatically have kubernetes (pod, container, etc) labels when deployed in a GKE container.

What did you see instead?

It does not make a difference if running locally or on GKE. Both stats end up in the "global" resource without any additional labels (just project_id)

Additional context

  • The container has the env variable KUBERNETES_SERVICE_HOST=10.111.0.1
  • We do not use GOOGLE_APPLICATION_CREDENTIALS as it works without that on GKE

Any help is welcome. We would like to debug this, but don't see an easy way. Is there some verbose logging mode in this exporter?

fiws avatar Jan 09 '20 14:01 fiws

I think i've found the problem.

The container needs the NAMESPACE and CONTAINER_NAME environment variables. Otherwise this line will reset the resource type and all labels to some default stuff.

I don't get why this line is there. Seems to make everything worse for me.. The required env variables should at least be documented.

fiws avatar Jan 09 '20 15:01 fiws

Thanks for reporting this!

AFAIK Stackdriver exporter required us to pass all the labels with a value, in case of GKE these labels. If any of the expected label is missing, the exporter will be unhappy and throw an exception.

Something like this:

One or more TimeSeries could not be written: The set of resource labels is incomplete. 
Missing labels: (<label>).: timeSeries[<number>]

This line is added as a preemptive measure to handle the missing labels case.

mayurkale22 avatar Jan 09 '20 16:01 mayurkale22

Ok, that makes sense now. Thanks for the response :)

My opinion:

I would prefer it if the namespace and container name would be set to "N/A" just to see that the detection is working. (better than nothing)

Also: this should be documented ..

fiws avatar Jan 09 '20 17:01 fiws

stumbled over this issue, had the same, linking related issue that has more details https://github.com/census-instrumentation/opencensus-python/issues/796

tl;dr: GKE somewhen stopped populating containers with NAMESPACE and CONTAINER_NAME so you now have to add that yourself to Deployment env like

        - name: NAMESPACE
          valueFrom:
            fieldRef:
              fieldPath: metadata.namespace
        - name: CONTAINER_NAME
          value: my-awesome-service

philicious avatar May 20 '20 20:05 philicious