graylog2-server Negative values in pipeline rules throughput

Expected Behavior

The throughput should always be correct

Current Behavior

Throughput values from the Pipeline rules toggle from positive to negative

Steps to Reproduce

You need a lot of traffic, we couldn't reproduce, but the customer also said if we need help with reproducing he is there to help.
Recording shows the whole problem live recording.webm
Create a Pipeline rule with a lot of traffic
check the throughput values

Context

Seeing Throughput values from the Pipeline rules toggle from positive to negative - see attached recording. Only on one pipeline called 'Streamrouting' which was set up 7 years ago and running. Customer has ~70 pipelines. They checked some of them but only see the negative values only on 'Streamrouting'.

Customer aware first click to navigate on the Pipeline rules page shows wrong values, but the wrong values are displayed constantly. They toggle from positive to negative every few seconds without any other interaction with the page.

Values are correct on Manage Pipelines >> Pipelines overview page - don't see negative throughput numbers there.

On one screenshot we see ~300 million msg/s under Throughput but only around ~11k coming in On second screenshot ~300 million msg/s under Throughput but only around ~11k incoming

Code of the rule with millions of messages:

rule "continue to next stage"
when
true
then
set_field("continue", true);
remove_field("continue");
end

Screenshot showing minus ~385 million msg/s under Throughput

Screenshot showing ~300 million msg/s under Throughput but only around ~11k coming in

The issue was discussed in Slack https://graylog.slack.com/archives/C036LC4K744/p1717668058792819 and from there it sounds like a known issue, and no one submitted a bug report until now.

Customer Environment

Graylog Version: 5.2.7 OpenSearch Version: 7.10.2 MongoDB Version: 6.0.12

(created from Zendesk ticket #570)
gz#570

Jun 10 '24 16:06 ryan-carroll-graylog

https://github.com/Graylog2/graylog2-server/issues/19696 may be related too?

Jun 20 '24 16:06 damianharouff

My original theory had something to do with a number being too large for its data type, such as a number > 2,147,483,647 for a signed 32bit data type. I'm not sure if that is relevant though.

The other thing i am noticing, which i can reproduce somewhat consistently, is how the graylog shows a much larger value initially before showing the accurate metric. I suspect this is showing the "total" amount since the amount rather than the accurate metric.

For example, i can sometimes get my pipeline to display a number of about 7 million, which matches up with the metric's total value:

{
    "full_name": "org.graylog.plugins.pipelineprocessor.ast.Pipeline.62976aa4578cf42110255552.executed",
    "metric": {
        "rate": {
            "total": 7189178,
            "mean": 88.88558165055522,
            "five_minute": 111.75158774985267,
            "fifteen_minute": 114.95970546937819,
            "one_minute": 102.30150129886736
        },
        "rate_unit": "events/second"
    },
    "name": "executed",
    "type": "meter"
}

In the video below i am boing back/forward in the browser to trigger this behavior:

https://videos.graylog.com/watch/VTdQfbzHSvgFAA6n2r4AM1?

Its not clear if this is the same issue or not. It also appears the metric rate calculation is being done on the front end?

Jun 20 '24 21:06 drewmiranda-gl

@waab76 Can we pls get an update on this issue?

May 09 '25 08:05 StefanTheGerman

@tellistone Any idea when the Core Team might have appetite for this?

May 23 '25 17:05 waab76