graylog2-server icon indicating copy to clipboard operation
graylog2-server copied to clipboard

rows excluded from the limit should be summed up into a single row named "other"

Open tellistone opened this issue 4 years ago • 4 comments

The way the "limit" option works within aggregations is not intuitive and produces strictly misleading outputs/visualisations.

Expected Behavior

Rows excluded from an aggregation using the limit option should be summed up into a single row named "other". This way, the relative percentages of each row always remain constant and accurate.

Current Behavior

At present, if I have a pie chart that would have 7 diferent "rows" in the legend, and I apply a limit of 5, I will get a pie chart that excludes the remaining 2 "rows" entirely from the results.

Context

The limit option in Aggregations should not exclude results from the series.

At present, if I have a pie chart that would have 7 diferent "rows" in the legend, and I apply a limit of 5, I will get a pie chart that excludes the remaining 2 "rows" entirely from the results.

This distorts the ability of the output to correctly show relativity (eg. what % of events captured are from each row) and can make for very misleading results.

For Example, does kernel represent 52.3% of results? or 41% of results? Or in fact, neither?

these two pie charts show the SAME DATASET, just with a different "limit" on set rows.

Screenshot 2021-12-08 at 10 28 25

Screenshot 2021-12-08 at 10 28 08

The way it should work in my view (this is the way it works on Splunk for example) is that rows excluded from the limit should be summed up into a single row named "other". This way, the relative percentages always remain constant and accurate.

Why is the current way the limit function works a cardinal sin? I think because the aggregation controls should not be able to affect which messages are encompassed by the aggregation visualisation - only the search filter should be able to define which results are encompassed. The aggregation controls should only be allowed to show how those results are displated. It's important to seperate powers in the interface this way so the user can understand where their results are coming from - by effectively having two seperate ways to filter out results, you make neither one definitive.

Your Environment

  • Graylog Version: 4.2.0

tellistone avatar Dec 08 '21 10:12 tellistone

Cousin of https://github.com/Graylog2/graylog2-server/issues/11516

tellistone avatar Dec 08 '21 11:12 tellistone

We should have an option to display the "other" group as well, as we had it in the old quick values widget. There are multiple paths and options, the context menu says "Show top values", which one could argue doesn't necessarily need to have the "others" group, but for many applications users will want to know the distribution and thus knowing how many "others" there are is important.

kroepke avatar Dec 13 '21 13:12 kroepke

In that scenario, I'd suggest the option to display the "other" group should be enabled by default (default settings should not filter messages out of results).

tellistone avatar Dec 14 '21 09:12 tellistone

Bumping this, its so frustrating looking for a middle ground between "graph is too busy to read" and "half the results are missing"

tellistone avatar Sep 01 '23 09:09 tellistone