[Bug]: Dynamic Configuration Manager: Unable to Assign Existing Monitoring Jobs to New Node
Bug description
When using the Dynamic Configuration Manager in Netdata Cloud, encounter an error message "Unknown config id given" when trying to assign existing monitoring tasks to a new node by clicking "Submit to multiple nodes"
Expected behavior
Success
Steps to Reproduce
-
Perform a Full New Install of a Node (LXC by Proxmox):
- Complete a fresh installation of a node using LXC in Proxmox.
-
Install Using Integrations Auto Shell Script:
- Use the auto shell script to install
kickstart.shand wait for the node to appear as active in the Netdata Cloud dashboard.
- Use the auto shell script to install
-
Submit Tasks to Multiple Nodes:
- Navigate to "Manage Space / Netdata Space / Configurations".
- Edit an existing job, such as "ping", and attempt to submit it to multiple nodes.
Installation method
kickstart.sh
System info
Linux netdata-sz-ctc 5.15.149-1-pve netdata/netdata#1 SMP PVE 5.15.149-1 (2024-03-29T14:24Z) x86_64 x86_64 x86_64 GNU/Linux
/etc/os-release:NAME="Rocky Linux"
/etc/os-release:VERSION="9.4 (Blue Onyx)"
/etc/os-release:ID="rocky"
/etc/os-release:ID_LIKE="rhel centos fedora"
/etc/os-release:VERSION_ID="9.4"
/etc/os-release:PLATFORM_ID="platform:el9"
/etc/os-release:PRETTY_NAME="Rocky Linux 9.4 (Blue Onyx)"
/etc/os-release:ANSI_COLOR="0;32"
/etc/os-release:LOGO="fedora-logo-icon"
/etc/os-release:CPE_NAME="cpe:/o:rocky:rocky:9::baseos"
/etc/os-release:SUPPORT_END="2032-05-31"
/etc/os-release:ROCKY_SUPPORT_PRODUCT="Rocky-Linux-9"
/etc/os-release:ROCKY_SUPPORT_PRODUCT_VERSION="9.4"
/etc/os-release:REDHAT_SUPPORT_PRODUCT="Rocky Linux"
/etc/os-release:REDHAT_SUPPORT_PRODUCT_VERSION="9.4"
/etc/redhat-release:Rocky Linux release 9.4 (Blue Onyx)
/etc/rocky-release:Rocky Linux release 9.4 (Blue Onyx)
/etc/system-release:Rocky Linux release 9.4 (Blue Onyx)
Netdata build info
Packaging:
Netdata Version ____________________________________________ : v1.46.2
Installation Type __________________________________________ : binpkg-rpm
Package Architecture _______________________________________ : x86_64
Package Distro _____________________________________________ :
Configure Options __________________________________________ : dummy-configure-command
Default Directories:
User Configurations ________________________________________ : /etc/netdata
Stock Configurations _______________________________________ : /usr/lib/netdata/conf.d
Ephemeral Databases (metrics data, metadata) _______________ : /var/cache/netdata
Permanent Databases ________________________________________ : /var/lib/netdata
Plugins ____________________________________________________ : /usr/libexec/netdata/plugins.d
Static Web Files ___________________________________________ : /usr/share/netdata/web
Log Files __________________________________________________ : /var/log/netdata
Lock Files _________________________________________________ : /var/lib/netdata/lock
Home _______________________________________________________ : /var/lib/netdata
Operating System:
Kernel _____________________________________________________ : Linux
Kernel Version _____________________________________________ : 5.15.149-1-pve
Operating System ___________________________________________ : unknown
Operating System ID ________________________________________ : unknown
Operating System ID Like ___________________________________ : unknown
Operating System Version ___________________________________ : unknown
Operating System Version ID ________________________________ : 9.4
Detection __________________________________________________ : unknown
Hardware:
CPU Cores __________________________________________________ : 2
CPU Frequency ______________________________________________ : 2000000000
RAM Bytes __________________________________________________ : 2147483648
Disk Capacity ______________________________________________ : 375141883904
CPU Architecture ___________________________________________ : x86_64
Virtualization Technology __________________________________ : none
Virtualization Detection ___________________________________ : systemd-detect-virt
Container:
Container __________________________________________________ : lxc
Container Detection ________________________________________ : systemd-detect-virt
Container Orchestrator _____________________________________ : none
Container Operating System _________________________________ : Rocky Linux
Container Operating System ID ______________________________ : rocky
Container Operating System ID Like _________________________ : rhel centos fedora
Container Operating System Version _________________________ : 9.4 (Blue Onyx)
Container Operating System Version ID ______________________ : 9.4
Container Operating System Detection _______________________ : /etc/os-release
Features:
Built For __________________________________________________ : Linux
Netdata Cloud ______________________________________________ : YES
Health (trigger alerts and send notifications) _____________ : YES
Streaming (stream metrics to parent Netdata servers) _______ : YES
Back-filling (of higher database tiers) ____________________ : YES
Replication (fill the gaps of parent Netdata servers) ______ : YES
Streaming and Replication Compression ______________________ : YES (zstd lz4 gzip)
Contexts (index all active and archived metrics) ___________ : YES
Tiering (multiple dbs with different metrics resolution) ___ : YES (5)
Machine Learning ___________________________________________ : YES
Database Engines:
dbengine (compression) _____________________________________ : YES (zstd lz4)
alloc ______________________________________________________ : YES
ram ________________________________________________________ : YES
none _______________________________________________________ : YES
Connectivity Capabilities:
ACLK (Agent-Cloud Link: MQTT over WebSockets over TLS) _____ : YES
static (Netdata internal web server) _______________________ : YES
h2o (web server) ___________________________________________ : YES
WebRTC (experimental) ______________________________________ : NO
Native HTTPS (TLS Support) _________________________________ : YES
TLS Host Verification ______________________________________ : YES
Libraries:
LZ4 (extremely fast lossless compression algorithm) ________ : YES
ZSTD (fast, lossless compression algorithm) ________________ : YES
zlib (lossless data-compression library) ___________________ : YES
Brotli (generic-purpose lossless compression algorithm) ____ : NO
protobuf (platform-neutral data serialization protocol) ____ : YES (system)
OpenSSL (cryptography) _____________________________________ : YES
libdatachannel (stand-alone WebRTC data channels) __________ : NO
JSON-C (lightweight JSON manipulation) _____________________ : YES
libcap (Linux capabilities system operations) ______________ : NO
libcrypto (cryptographic functions) ________________________ : YES
libyaml (library for parsing and emitting YAML) ____________ : YES
Plugins:
apps (monitor processes) ___________________________________ : YES
cgroups (monitor containers and VMs) _______________________ : YES
cgroup-network (associate interfaces to CGROUPS) ___________ : YES
proc (monitor Linux systems) _______________________________ : YES
tc (monitor Linux network QoS) _____________________________ : YES
diskspace (monitor Linux mount points) _____________________ : YES
freebsd (monitor FreeBSD systems) __________________________ : NO
macos (monitor MacOS systems) ______________________________ : NO
statsd (collect custom application metrics) ________________ : YES
timex (check system clock synchronization) _________________ : YES
idlejitter (check system latency and jitter) _______________ : YES
bash (support shell data collection jobs - charts.d) _______ : YES
debugfs (kernel debugging metrics) _________________________ : YES
cups (monitor printers and print jobs) _____________________ : YES
ebpf (monitor system calls) ________________________________ : YES
freeipmi (monitor enterprise server H/W) ___________________ : YES
nfacct (gather netfilter accounting) _______________________ : NO
perf (collect kernel performance events) ___________________ : YES
slabinfo (monitor kernel object caching) ___________________ : YES
Xen ________________________________________________________ : NO
Xen VBD Error Tracking _____________________________________ : NO
Logs Management ____________________________________________ : YES
Exporters:
AWS Kinesis ________________________________________________ : NO
GCP PubSub _________________________________________________ : NO
MongoDB ____________________________________________________ : YES
Prometheus (OpenMetrics) Exporter __________________________ : YES
Prometheus Remote Write ____________________________________ : YES
Graphite ___________________________________________________ : YES
Graphite HTTP / HTTPS ______________________________________ : YES
JSON _______________________________________________________ : YES
JSON HTTP / HTTPS __________________________________________ : YES
OpenTSDB ___________________________________________________ : YES
OpenTSDB HTTP / HTTPS ______________________________________ : YES
All Metrics API ____________________________________________ : YES
Shell (use metrics in shell scripts) _______________________ : YES
Debug/Developer Features:
Trace All Netdata Allocations (with charts) ________________ : NO
Developer Mode (more runtime checks, slower) _______________ : NO
Additional info
No response
Thank you @onion83 for reporting! We're investigating in order to fix this soon.
@onion83, hey. Can you try creating a ping job directly on netdata-sz-ctc? You will need to select the node
Unknown config id given
This may indicate that the go.d.plugin (it has ping functionality) is not running on that particular node.
As shown in the attached video, I recreated a brand-new node, netdata-sz-ctc2, and added it to the dashboard, confirming it is online. The video shows netdata.cloud on the left and the local node on the right.
- SSH into
netdata-sz-ctc2and confirm via thepscommand thatgo.d.pluginis running in the system processes. - Create a local task named
localtestand confirm its success. - In the
netdata.cloudmanagement console, attempt to sync an existing node's (netdata-cmc)appsmonitoring task tonetdata-sz-ctc2. This failed. - In the local backend of
netdata-sz-ctc2, attempt to add a task namedappswith the monitoring target1.1.1.1. - In the
netdata-cmcnode, use the "Submit to multiple nodes" feature and selectnetdata-sz-ctc2as the sync target. This time, the task succeeded. - After refreshing the browser with F5 and editing the
appsmonitoring task onnetdata-sz-ctc2, the monitoring target is now fully synchronized withnetdata-cmc.
Therefore, the current bug is: After adding a new node, an empty task with the same name must be created to sync with other nodes (only tested with the ping plugin, other plugins not tested).
Expected:
- Automatically create non-existent monitoring tasks during synchronization.
- Feature: Use an auto-install script to join the same room and automatically sync all monitoring tasks, avoiding manual configuration and improving operational efficiency.
https://github.com/user-attachments/assets/107211c2-fd1c-45b8-9d9b-e25392fdf517
@onion83, hey. Not related to the issue, but: performance in the "privileged" mode can become less efficient as the number of targets grows. This is because CPU usage scales disproportionately, meaning it increases much faster than the number of targets. That is a bug in the upstream library we use for go.d/ping. See netdata/netdata#15410.
hey @ilyam8 Please take a look at the title, issue, and video description. This is specifically about the task distribution issue with the Dynamic Configuration Manager and not related to ping values or system permissions 、cpu etc...
I know that, that is why I started with "Not related to the issue".
@ilyam8 : Is this a bug at the agent side? I don't see why the user needs to create a local job (on the local Agent dashboard) before submitting it to multiple nodes? Or @kapantzak have you identified some issue on the FE side for this?
@sashwathn I don't see any FE issue here
Is this a bug at the agent side?
What is happening:
- @onion83 uses the "update" action to sync (copy) a dyncfg item from one node to another
- Click Edit an existing job on A.
- Click Submit to Multiple Nodes.
- Select Nodes to submit (B, C, ...).
- This results in a "Dyncfg functions intercept: id is not found" error for any node other than the job source (A) because we are trying to do an "update" and Netdata can't do that because the "update" must be used for an existing job.
We need to provide another way to copy dyncfg items from Node to Node, or treat "update" as "add" if there is no existing job.
or treat "update" as "add" if there is no existing job.
I think we need this, I will discuss it with @ktsaou when he returns.
So for all nodes is an update, but for the new nodes it has to be an add.
The solution is to convert an update to an add if the item is not already there?
The solution is to convert an update to an add if the item is not already there?
Yes.
cc @onion83
An alternative is to use this workflow:
- Click Edit an existing job on A.
- Click "copy this item and create a new one".
- Copy/paste the name (so it appears with the same name on other nodes).
- Select Nodes to submit (B, C, ...) and Submit.
This will result in "add" - no issues.
https://github.com/user-attachments/assets/d4c102cd-8f38-4ae3-8c73-eb8f45e4e964
@kapantzak hey 👋 We discussed the issue with @ktsaou and suggest the following changes to frontend:
- When doing "Submit to multiple nodes" during "Edit":
- If the node is not the origin (for origin always "update")
- Do a "get" request first to find out if the item exists or not
- If exists - "update"
- Otherwise - "add"
Hi @onion83, we released some changes for this that hopefully fixes the issue.
It works! thank you