[Feat]: ability to "expand by" or similar for charts (e.g. expand by container, mount point etc)
Problem
Some users have expressed a preference for the old agent dashboard approach of a chart per container etc, so they can easily by default see metrics split out a bit more by something that makes sense .e.g by container or mount point etc.
background:
- https://www.reddit.com/r/netdata/comments/158j5lb/after_the_update_of_netdata_ui_its_more/
- https://github.com/netdata/netdata/discussions/15415
Description
Some options:
A. some sort of "expand by" or "split by" that easily just breaks the chart back out into its constituent pieces B. specific point solutions around custom dashboards. For example a custom dashboard specifically for each container etc. that is also in some way dynamic if you add more container views.
Importance
really want
Value proposition
- better and easier visualizations for users
Proposed implementation
TBD
ability to set and overwrite chart defaults at space and room level could be a partial solution here too
e.g. a user could just set default group by for various container charts to be "by cgroup"
https://github.com/netdata/netdata-cloud/issues/789
This issue has been mentioned on the Netdata Community Forums. There might be relevant details there:
https://community.netdata.cloud/t/6-years-experience-but-can-not-use-netdata/4994/2
@Pingger I saw you discussing this feature in some places, are you willing to jump on a call with me (PM from Netdata) and our Product Designer to better discuss your use-case and expectations?
@hugovalente-pm While getting on call, might be difficult, here are a few relevant things:
- I am demonstrating based on my private setup (https://netdata.iskariot.info), but also manage a bigger one for the company I'm working on.
- I manage multiple servers and netdata nodes on those servers
- Those Nodes should be able to be put into groups in the left pane, without a cloud account!
- e.g. by the defined tags in the
netdata.confor a specific one like "dashboard-groups" - at the moment my private "servers" (2 root-servers and a NAS) are shown in the same graphs as my notebooks, which leads to confusing information about a server suddenly having wifi until you realize, that a notebook got mixed in again
- e.g. by the defined tags in the
- I also use netdata on my private Devices (basically everything linux I have, uses netdata)
- All netdata instances feed into a singular "big" netdata-instance, that holds the stats for the previous ~14 months. (atm ~10GiB dbengine; The "tiers"-update cost me my database!)
The following mainly boils down to:
- It is very clunky to get the graphs to be filtered the way you want
- and the filters don't persist
- I need more than 1 filter preset / a graph for each filter I want
- Because of those reasons, I have started to write my own dashboard (which is still very early in its infancy and thus has quite a lot of hardcoding going on...)
Containers/Cgroups:
- the nodes are specilised to run specific stuff:
- Webservers
- (expected to be) low cpu
- low ram
- high network
- Databases
- low cpu
- medium to high ram
- low network
- Git/CI
- low cpu with high spikes
- low ram with high spikes
- low network
- Game-Servers
- depending on game cpu/ram
- medium network
- tor-nodes
- high cpu
- medium ram
- high network
- DNS-Server (can wrapped into webservers)
- low cpu
- low ram
- low network
- Backup infrastructure
- low cpu
- low ram
- SHITLOAD of network
- Everything in those groups is inside a linux container/cgroup
- Those types/groups I would like to be displayed distinct from each other, so I can, without having to change anything upon loading the page, compare datebases to databases, webservers to webservers and definitely NOT WebServers to gameservers.
- For that it would be nice to be able to flag containers/cgroups (like you can add custom netdata flags for netdata instances)
- Alternatively to just be able to use the WebUI, WITHOUT A CLOUD ACCOUNT!, to configure how to split groups apart.
- Also I'd like to have a similar graph for networking and not a "Total" gauge, that just sums the traffic.
![]()
![]()
- There was already some improvement on this graph, but it is still somewhat confusing ... note the "13 more values" on 2 of the graphs and "11 more values" on the other one. That makes no sense.
- adding netdata to each and every container is not feasable, as that overhead would add up. netdata is very resource friendly, but the netdata idle I observed is 10-50% of a core (average of 25%). multiply by sometimes more than 20 containers and you see quite an impact.
![]()
Similar issues:
- Systemd services are currently ALL merged into a singular graph. The usefulness of that singular graph is exactly 0
![]()
- How many services are active? (in the screenshot 4? Of the few hundreds that are actually running across all nodes?!)
![]()
- The CPU/RAM/...-Graphs for the systemd-units have the same issue the cgroups have.
- systemd base unit files should all be low cpu/ram/net...
- some service e.g. the vpn-client are instead low cpu/ram, but high net
- Some graphs just don't show some information for no apparent reason:
![]()
- I would like to group the systemd units in a similar manner to the containers
- (Hard-)Drives and mount-points: A summary graph is fine, but I'd also like to have each drive/partition/volume by itself.
- Network interfaces: same, but in addition in the summary graphs an up and down listing would be nice like the cgroups have for CPU and RAM
Other issues I have noticed:
- the health notifications sent to root via the system mail command ignore the
delayrules and fire instantly instead, sometimes causing quite a spam of mails.- dbengine tiers VERY often loose data for entire weeks or even months! (which is why I disabled those)
- A way to configure to always default to "force play" in the time selector at the top.
- The Dashboard pausing while hovering a graph is just plain annoying and should also be configurable
- health-configs can't be properly debugged. There is no apparent log or method, why a specific alarm doesn't register with a chart, or whether there are syntax errors.
- plugin configs should all be by themselves! (e.g. cgroups is located in the netdata.conf, while go has its own config); netdata.conf needs to be responsible only for netdata and for every tiny subsetting for plugins! On a clean installation it is 741 lines ... most of it being the proc-plugin, with commented out settings, that should be put into a proc.d folder instead.
I'll try to keep this comment updated and with a changelog as issues/ideas arise, for the coming week or so.
Changelog:
- 2024-01-11 21:31 fixed typos, reordered a few points, because my jumping around while writing didn't help readability
- 2024-01-12 11:22 Added health-config debugging note
- 2024-01-12 17:30 Added netdata.conf size and inconsistency grievances