graylog2-server icon indicating copy to clipboard operation
graylog2-server copied to clipboard

Negative values in pipeline rules throughput

Open ryan-carroll-graylog opened this issue 1 year ago • 2 comments

Expected Behavior

The throughput should always be correct

Current Behavior

Throughput values from the Pipeline rules toggle from positive to negative

Steps to Reproduce

  1. You need a lot of traffic, we couldn't reproduce, but the customer also said if we need help with reproducing he is there to help.

  2. Recording shows the whole problem live recording.webm

  3. Create a Pipeline rule with a lot of traffic

  4. check the throughput values

Context

Seeing Throughput values from the Pipeline rules toggle from positive to negative - see attached recording. Only on one pipeline called 'Streamrouting' which was set up 7 years ago and running. Customer has ~70 pipelines. They checked some of them but only see the negative values only on 'Streamrouting'.

Customer aware first click to navigate on the Pipeline rules page shows wrong values, but the wrong values are displayed constantly. They toggle from positive to negative every few seconds without any other interaction with the page.

Values are correct on Manage Pipelines >> Pipelines overview page - don't see negative throughput numbers there.

On one screenshot we see ~300 million msg/s under Throughput but only around ~11k coming in On second screenshot ~300 million msg/s under Throughput but only around ~11k incoming

Code of the rule with millions of messages:

rule "continue to next stage"
when
true
then
set_field("continue", true);
remove_field("continue");
end

Screenshot showing minus ~385 million msg/s under Throughput 1

Screenshot showing ~300 million msg/s under Throughput but only around ~11k coming in 2

The issue was discussed in Slack https://graylog.slack.com/archives/C036LC4K744/p1717668058792819 and from there it sounds like a known issue, and no one submitted a bug report until now.

Customer Environment

Graylog Version: 5.2.7 OpenSearch Version: 7.10.2 MongoDB Version: 6.0.12

(created from Zendesk ticket #570)
gz#570

ryan-carroll-graylog avatar Jun 10 '24 16:06 ryan-carroll-graylog

https://github.com/Graylog2/graylog2-server/issues/19696 may be related too?

damianharouff avatar Jun 20 '24 16:06 damianharouff

My original theory had something to do with a number being too large for its data type, such as a number > 2,147,483,647 for a signed 32bit data type. I'm not sure if that is relevant though.

The other thing i am noticing, which i can reproduce somewhat consistently, is how the graylog shows a much larger value initially before showing the accurate metric. I suspect this is showing the "total" amount since the amount rather than the accurate metric.

For example, i can sometimes get my pipeline to display a number of about 7 million, which matches up with the metric's total value:

{
    "full_name": "org.graylog.plugins.pipelineprocessor.ast.Pipeline.62976aa4578cf42110255552.executed",
    "metric": {
        "rate": {
            "total": 7189178,
            "mean": 88.88558165055522,
            "five_minute": 111.75158774985267,
            "fifteen_minute": 114.95970546937819,
            "one_minute": 102.30150129886736
        },
        "rate_unit": "events/second"
    },
    "name": "executed",
    "type": "meter"
}

In the video below i am boing back/forward in the browser to trigger this behavior:

https://videos.graylog.com/watch/VTdQfbzHSvgFAA6n2r4AM1?

Its not clear if this is the same issue or not. It also appears the metric rate calculation is being done on the front end?

drewmiranda-gl avatar Jun 20 '24 21:06 drewmiranda-gl

@waab76 Can we pls get an update on this issue?

StefanTheGerman avatar May 09 '25 08:05 StefanTheGerman

@tellistone Any idea when the Core Team might have appetite for this?

waab76 avatar May 23 '25 17:05 waab76