[Feat]: ability to delete unseen, stale or live nodes (other than Offline)
Problem
As a user i can only delete "offline" nodes from NC. I should be able to delete any nodes i want.
We need to split the problem into cases with node status as a key:
- Online - We could either ban on the cloud level (not ideal) or instruct agent to disconnect by dropping cloud configuration. This is easy for directly connected nodes. More complicated case is when agent connects through claimed parent, set of parents or there are more than 1 parents in line for the node. We could disable streaming in such case, I think. Just ban on parent level from cloud connection only will mean that it will still collect the data from the node in question.
- Stale - Same as above but display a warning that data for this node is going to be deleted too (we should instruct a parent(s) to do so - either by marking the data to be removed and letting garbage collector to do it's job or enforcing the operation directly).
- Unseen - Just let me remove it and remove all the data that this particular node managed to imprint on the cloud - mostly DB entry and credentials for mqtt. I do not know if it is even possible to have an Unseen node connected through the parent so I have no idea about handling this case.
- Offline - there is an ability to remove node already.
Example: I had a group of 11 nodes streaming to my parent. I deleted these VM's since i no longer need them. However i still see them in Netdata Cloud and am unable to delete them from NC.
Should i not be able to delete them? Unsure if this is a bug or feature request.
These nodes are gone an never coming back so i would like to remove them from NC. I guess maybe eventually the data for them might fall away on my parent and maybe then they would be offline in NC maybe and then i could delete perhaps. Unsure.

https://netdata-cloud.slack.com/archives/CS3PB0VJ7/p1671026396555759
Description
- Cleaner infra view.
- Control over the space without waiting X days for nodes to be marked as offline.
- More freedom in testing things without a fear of injecting ghost nodes or the same node more than once (changing configuration by accident or on purpose might change the claimid)
- Probably less ghost spaces - I imagine that user that just starts with NDC and tests it's capabilities might create a new space just to clean up the view.
- I believe some users were confused when they first tried NDC because they couldn't delete the nodes that were either set up incorrectly or already switched off. It could be a cause for dropping the offering entirely, especially when dealing with dynamic environments.
Importance
must have
Value proposition
- let me keep my space clean
Proposed implementation
No response
This issue has been mentioned on the Netdata Community Forums. There might be relevant details there:
https://community.netdata.cloud/t/cant-delete-stale-nodes/3909/2
I found you can delete them if you delete the parent and re-install fresh on that parent machine. You have to remove the parent and all vnodes from the cloud dashboard and then when you re-claim the parent host it will set things up fresh.
I tried to erase my historical data directly to see if that would clear it up, as a workaround until netdata makes an official way to do this. I opened up the list of stale nodes:
and mouse-over'd the stale node to delete and copied a link like https://EXAMPLE.ORG/v2/spaces/DOMAINTLD/rooms/local/nodes/888586af-e5ab-47f2-8094-c4948fd1243a.
Then I extracted the UUID and deleted the folder that holds its data on my parent node:
systemctl stop netdata
cd /var/lib/netdata
rm -r 888586af-e5ab-47f2-8094-c4948fd1243a ... # deleting each of the folders
systemctl start netdata
On rebooting, the charts are gone, but the node itself is still listed as "stale"
So that wasn't enough.
I poked around some more and found this sqlite database:
root@monitor:~# sqlite3 /var/cache/netdata/netdata-meta.db
SQLite version 3.42.0 2023-05-16 12:36:15
Enter ".help" for usage hints.
sqlite> .headers on
sqlite> .tables
alert_hash dimension host metadata_migration
chart health_log host_info node_instance
chart_label health_log_detail host_label
sqlite> select * from host where hostname='host1.example.org';
host_id|hostname|registry_hostname|update_every|os|timezone|tags|hops|memory_mode|abbrev_timezone|utc_offset|program_name|program_version|entries|health_enabled
�9�ƃ!�����wK|host1.example.org|host1.example.org|15|linux|America/Toronto||1|5|EST|-18000|netdata|v1.33.1|0|1
��ER����
�z���|host1.example.org|host1.example.org|15|linux|Etc/UTC||1|5|EST|-18000|netdata|v1.42.1|0|1
annoyingly, host_id, presumably the UUID, is stored in binary, while the rest is stored as text, but I was able to remove the entry with:
sqlite> delete from host where hostname='host1.example.org' and program_version='v1.33.1';
After another
root@monitor:~# systemctl restart netdata
the stale node is now gone from my dashboard. :tada:
Unfortunately this is not very clean. I believe there are still entries in the host_label and host_info and and node_instance tables referencing the deleted host_id, but I don't know how to input binary data in the sqlite CLI and I don't feel like digging out python right now to do it, so the garbage is just going to sit around.
I had an installation problem with a node and now it's marked as "Stale" and "delete is disabled". The node is dead and will never be coming back. How do I get rid of this thing? Is there really no way to delete this??
This issue has been mentioned on the Netdata Community Forums. There might be relevant details there:
https://community.netdata.cloud/t/impossible-to-delete-stale-node/5537/1
@netdata-community-bot funny. that's MY post.
+1 bumping this feature request. I'd hate to have to hack around in a database to be able to get rid of stale machines that landed there by accident, and waiting for the data to expire seems like an inelegant alternative.
edit: It turns out there is a way, but it's not GUI-friendly. Got this from https://community.netdata.cloud/t/impossible-to-delete-stale-node/5537/3
- From app.netdata.cloud, navigate to your Node list
- Next to the name of the Stale node, click on the little (i) symbol (View node information)
- At the very bottom of the panel that opens to the right, you will see a "View node info in "json" button - click it. You should see a message that says “JSON copied to clipboard”
- Paste that into a text editor.
- Grab the value of the id: {...} key. This should be a string in UUID format, e.g. 6e072590-a422-45b2-bdab-cdd3fb14ad68
- Connect to your parent node via SSH
- Execute the following command:
netdatacli remove-stale-node {uuid}substituting {uuid} above with your real one
@darxtorm Until they make this easier, here are steps I took recently to remove a stale node, which were kindly provided by @ilyam8. Worked for me.
Combination of GUI and CLI
- From app.netdata.cloud, navigate to your Node list
- Next to the name of the Stale node, click on the little
(i)symbol (View node information) - At the very bottom of the panel that opens to the right, you will see a "View node info in json" button - click it. You should see a message that says “JSON copied to clipboard”
- Paste that into a text editor.
- Grab the value of the
id: {...}key. This should be a string in UUID format, e.g.6e072590-a422-45b2-bdab-cdd3fb14ad68 - Connect to your parent node via SSH
- Execute the following command:
substitutingnetdatacli remove-stale-node {uuid}{uuid}above with your real one, obviously…
CLI-only
- ssh to the PARENT node
- run
netdatacli aclk-state - locate the stale node's UUID
- run
netdatacli remove-stale-node {uuid}
@darxtorm Until they make this easier, here are steps I took recently to remove a stale node, which were kindly provided by @ilyam8. Worked for me.
Absolutely, it's mildly clunky to say the least. Wanted to add that for cloud at least, after I had performed the above, I also had to go to Manage Space -> Nodes and perform a delete in there (the node was now showing as Offline rather than Stale, and the delete button was no longer disabled) to truly get rid of the ghost!
I used netdatacli remove-stale-node on a bunch of stale nodes but it didn't have any effect — other than changing the Node ID in the netdatacli aclk-state from a UUID to null.
Is there something else I'm missing? Each time I'd run the command, it would say something like:
Unregistering node with machine guid 83fb052f-49ee-11ab-b00f-3e2f6b85cde4, hostname = dc413990ab4a
(We had a bunch of test containers spin up and they all "registered" with our (on-prem) Netdata instance and now I can't figure out how to remove them...)
@eddyg Restarting the parent node should make disappear from the UI.
@stelfrag see https://github.com/netdata/netdata-cloud/issues/690#issuecomment-2259543333, is it expected that a restart is required?
@sashwathn hey, I think we need to allow removing stale nodes from the UI. It will simplify users life tremendously.
@eddyg Restarting the parent node should make disappear from the UI.
Fixed in https://github.com/netdata/netdata/pull/18381
Thanks for following up on this, Ilya!
Removing stale nodes:
- by hostname
- all stale nodes at once (when using the
ALL_NODESkeyword)
added in https://github.com/netdata/netdata/pull/18386