telegraf icon indicating copy to clipboard operation
telegraf copied to clipboard

Provide guidelines for how to structure a processor config

Open danielnelson opened this issue 7 years ago • 2 comments

Feature Request

A common task in a processor is to pick a tag, field or measurement, modify the key or the value, and save it over the old value or to a new field/tag. Now that we are beginning to build a library of utility processors of this style, I have noticed that the naming conventions and configuration style is not very uniform between them.

To come up with some guidelines and best practices, I did a survey of the plugins to see if we can take steps to unify their feel. This information when finalized should be included in the contributing documentation or on the wiki.

I primarily considered the following processors, the remaining processors cover other tasks and I don't think they need to be unified to the same degree:

  • rename (to be released 1.8)
  • enum (to be released 1.8)
  • regex (released in 1.7)
  • converter (released in 1.7)
  • strings (unmerged)
  • replace (unmerged)
  • math (unmerged)

I identified that we have two main styles for structuring the plugin configuration, I refer to them as tag/field tables and operation tables style below.

Tag/Field table configs have a subtable named tag/tags or field/fields:

[[processors.rename]]
  [[processors.rename.tag]]
    from = "hostname"
    to = "host"

Operation tables have a table named after the operation to perform:

[[processors.strings]]
  [[processors.strings.lowercase]]
    tag = "method"

The current breakdown between the styles is as such:

  • rename (tags/fields)
  • enum (tags/fields)
  • regex (tags/fields)
  • converter (tags/fields)
  • strings (operation)
  • replace (no subtables)
  • drop_strings (no options)
  • math (combo)

The two formats can be converted between simply, here is the rename processor in both styles:

[[processors.rename]]
  [[processors.rename.tag]]
    from = "hostname"
    to = "host"
[[processors.rename]]
  [[processors.rename.replace]]
    tag = "hostname"
    dest = "host"

While today we have more plugins using tags/fields, I think the operation style config has a couple advantages:

  • No potential conflict with the tags option, which is not currently available on processors but I think this is only an oversight, and it would be nice to add. Would have a similar conflict with a fields option if/when this is added.

  • Operation based tables allows for having operations that have different argument sets more easily and are checked for error automatically. This is because you can map each table to a different type in the plugin struct.

Proposal:

Use operation style subtables instead of tag/field/measurement subtables:

Refactor to operation style:

[[processors.rename]]
  [[processors.rename.tag]]
    from = "hostname"
    to = "host"

Perfect:

[[processors.rename]]
  [[processors.rename.replace]]
    tag = "hostname"
    dest = "host"

Naming fields

Several conventions have been suggested:

  • from / to
  • old / new
  • source / dest
  • key / result_key
  • tag|field|name / result_key

This is basically a matter of taste, so I'm just going to go with what looks best to me:

Select the tag, field, measurement using:

  • tag
  • tags (list)
  • field
  • fields (list)
  • measurement

If overwrite of the old value is desired, don't include a destination field. To write to a new field or tag use dest.

Examples

Remember that we aren't going to break compatiblity on already released plugins, so regex and converter are just examples oh how I would do it if writing it today.

Rename, enum are already merged for 1.8, I may update these before the release time permitting.

## already released; no change planned
[[processors.regex]]
  [[processors.regex.repl]]
    tag = "resp_code"
    pattern = "^(\\d)\\d\\d$"
    replacement = "${1}xx"

## already released; no change planned
[[processors.converter]]
  [[processors.converter.convert]]
    field = "resp_code"
    type = "integer"

[[processors.rename]]
  [[processors.rename.replace]]
    measurement = "network_interface_throughput"
    dest = "throughput"

  [[processors.rename.replace]]
    tag = "hostname"
    dest = "host"

  [[processors.rename.replace]]
    field = "lower"
    dest = "min"

[[processors.enum]]
  [[processors.enum.map]]
    field = "name"
    dest = "mapped"
    default = 0
    [processors.enum.map.value_mappings]
      value1 = 1
      value2 = 2

[[processor.math]]
  [[processor.math.unary]]
    function = "abs"
    fields = ["io_time", "read_time", "write_time"]

[[processors.strings]]
  [[processors.strings.lowercase]]
    tag = "method"
  [[processors.strings.uppercase]]
    field = "cs-host"
  [[processors.strings.trimleft]]
    field = "cs-host"
    cutset = "cs-"
    dest = "trimmed"
  [[processors.strings.replace]]
    measurement = "*"
    old = ":"
    new = "_"

danielnelson avatar Aug 31 '18 00:08 danielnelson

We should consider how a processor could support operations on tagkeys and fieldkeys, instead of only the values. It would be nice to be able to use strings replace on a fieldkey for example: https://github.com/influxdata/telegraf/issues/5173

danielnelson avatar Dec 27 '18 02:12 danielnelson

We would like to put some documentation around how to structure a processor config. If anyone has anything they would like highlighted please share.

sjwang90 avatar Mar 08 '21 23:03 sjwang90