netdisco icon indicating copy to clipboard operation
netdisco copied to clipboard

unknown vendor and model

Open inphobia opened this issue 2 years ago • 16 comments

Expected Behavior

being able to figure out just what device has unknown vendor & model.

Current Behavior

somehow i have a vendor "Unknown" (capital u) and model "unknown" (lowercase u) in my inventory and i can't for the life of me figure out which device that is. can't click through on either vendor or model, and had no success trying to find it in the database.

image

netdisco=> \pset null '<null>'
Null display is "<null>".

netdisco=> SELECT ip,uptime,layers,mac,serial,model,vendor,os,snmp_class FROM public.device WHERE vendor = 'unknown';
     ip      |  uptime   |  layers  |  mac   | serial | model | vendor  |   os   |     snmp_class
-------------+-----------+----------+--------+--------+-------+---------+--------+--------------------
 10.91.45.25 | 216654576 | 01000000 | <null> |        |       | unknown | <null> | SNMP::Info::Layer7
 10.109.1.22 |  18580172 | 01000000 | <null> |        |       | unknown | <null> | SNMP::Info::Layer7
(2 rows)

netdisco=> SELECT ip,uptime,layers,mac,serial,model,vendor,os,snmp_class FROM public.device WHERE vendor = 'Unknown';
 ip | uptime | layers | mac | serial | model | vendor | os | snmp_class
----+--------+--------+-----+--------+-------+--------+----+------------
(0 rows)

netdisco=> SELECT ip,uptime,layers,mac,serial,model,vendor,os,snmp_class FROM public.device WHERE model = 'unknown';
 ip | uptime | layers | mac | serial | model | vendor | os | snmp_class
----+--------+--------+-----+--------+-------+--------+----+------------
(0 rows)

netdisco=> SELECT ip,uptime,layers,mac,serial,model,vendor,os,snmp_class FROM public.device WHERE model = 'Unknown';
 ip | uptime | layers | mac | serial | model | vendor | os | snmp_class
----+--------+--------+-----+--------+-------+--------+----+------------
(0 rows)

netdisco=> SELECT ip,uptime,layers,mac,serial,model,vendor,os,snmp_class FROM public.device WHERE model is NULL;
 ip | uptime | layers | mac | serial | model | vendor | os | snmp_class
----+--------+--------+-----+--------+-------+--------+----+------------
(0 rows)

netdisco=> SELECT ip,uptime,layers,mac,serial,model,vendor,os,snmp_class FROM public.device WHERE snmp_class is NULL;
 ip | uptime | layers | mac | serial | model | vendor | os | snmp_class
----+--------+--------+-----+--------+-------+--------+----+------------
(0 rows)

this returned all snmp v3 devices, but not my mystery device: SELECT ip,uptime,layers,mac,model,vendor,os,snmp_class,snmp_comm FROM public.device WHERE snmp_comm is NULL;

SELECT * FROM public.device WHERE vendor is NULL; returns 26 rows, while the inventory says it knows of only 1 Unknown / unknown. it's interesting however that all 26 results have snmp_class as "SNMP::Info" vendor NULL.

Possible Solution

can be added to an unkown device/model report if i had any clue where this mysery device is hiding.

the only thing i can think of is the query SELECT * FROM public.device WHERE vendor is NULL; might get collapsed and shown as 1 device by the magic of dbix?

or, since i have 26 devices which identify with snmp_class "SNMP::Info" and have vendor NULL those are the issue? but while vendor is null, model is an empty string. do null & "" get handled the same perhaps?

Steps to Reproduce (for bugs)

if only i knew.

Context

sometimes devices from other teams get picked up by netdisco and discovered, when this happens i look at why and make an exclude filter and remove the devices again. most of the time this involves the unwanted device using public as community. i'm somewhat undecided how to handle this: it's a good way to find poorly configured devices, but also means i need to clean up & exluded at regular intervals.

until i get my db cleaned up again i've limited to use of public to a group of 4 ips.

  - tag: 'default_v2_readonly_4'
    community: 'public'
    read: true
    write: false
    only: "group:snmppublic"

Your Environment

  • Netdisco version used: 2.071001
  • SNMP::Info version used: 3.95

Config info (deployment.yml)

Device information

https://github.com/netdisco/netdisco/wiki/Snapshot#share-a-snapshot

inphobia avatar Dec 13 '23 20:12 inphobia

We have the same issue. Haven't found the source yet. Occasionally I delete the incorrect devices from the database with the SQL delete from device where model = '' and os is null;.

JeroenvIS avatar Sep 17 '24 13:09 JeroenvIS

can you upgrade and try again? Netdisco recently has improved vendor and model naming design. I'll close this as it's a very old ticket but feel free to reopen if needed.

ollyg avatar Jan 30 '25 19:01 ollyg

i upgrade every few weeks - think i'm on 2.084002 now, or close to that release, but i actually still had 1 "Unknown - unknown" when i checked yesterday.

iirc this tends to happen when the snmp::info class returns "SNMP::Info". i'll do my best to check at work later & try & gather some relevant info.

inphobia avatar May 28 '25 05:05 inphobia

Hi @inphobia ! OK many thanks for the update

I don't think the unknown/unknown is a bug or error, it's just the typical "unsupported vendor" issue we get from time to time (or a vendor not sticking to standards).

However since time passed, we have a new report "Device Inventory" in the reports menu and this very nicely shows the devices that are creating unknown in the main inventory page.

Ticket #1336 opened recently and suggested to make the unknown be clickable, and what I plan to do is link them to that Device Inventory report (unfortunately in Netdisco it's not actually possible to search for "undefined" or "empty" database fields - this is the next best thing).

If you do find out what vendor/model is causing the unknown fields, you can open a ticket for improved vendor support for the platform.

Hope that helps,

ollyg avatar May 28 '25 09:05 ollyg

Oh, forgot to say, there is actually a bug in the unknown presentation - I think it always displays "1" even if there are more devices!

ollyg avatar May 28 '25 10:05 ollyg

lets see, running 2.085001 atm and have this on the inventory page.

Image

Image

running a query on the device table for class "snmp::info" gives these:

ip creation dns description uptime contact name location layers num_ports mac serial model ps1_type ps2_type ps1_status ps2_status fan slots vendor os os_ver log snmp_ver snmp_class vtp_domain last_discover last_macsuck last_arpnip snmp_engineid chassis_id is_pseudo pae_is_enabled custom_fields tags vtp_mode
10.1.1.101 2025-02-25 11:09:14.753505 csb-template.aquafinad.aquafin.be NULL NULL   NULL   NULL 0 NULL     NULL NULL NULL NULL NULL NULL NULL NULL NULL NULL 2 SNMP::Info NULL 2025-02-25 15:06:08.288821 2025-02-25 12:41:03.190983 2025-02-25 12:05:58.974865     FALSE FALSE {} {} NULL
10.40.254.17 2019-04-09 23:27:39.327608 NULL NULL NULL   NULL   NULL 0 NULL     NULL NULL NULL NULL NULL NULL NULL NULL NULL NULL 2 SNMP::Info NULL 2025-05-28 11:09:03.336622 NULL NULL     FALSE FALSE {} {} NULL
10.40.254.18 2022-01-26 10:57:59.800236 NULL NULL NULL   NULL   NULL 0 NULL     NULL NULL NULL NULL NULL NULL NULL NULL NULL NULL 2 SNMP::Info NULL 2025-05-28 11:08:36.988435 NULL NULL     FALSE FALSE {} {} NULL
10.86.20.1 2022-05-20 17:30:32.47817 NULL NULL NULL   NULL   NULL 0 NULL     NULL NULL NULL NULL NULL NULL NULL NULL NULL NULL 2 SNMP::Info NULL 2025-05-18 19:05:24.631804 2023-10-11 09:33:48.758403 2023-10-11 08:32:33.347032     FALSE FALSE {} {} NULL
10.86.100.1 2022-05-20 17:21:15.592163 NULL Linux kp1375 2.6.39 #1 Sat Sep 22 06:37:38 UTC 2018 armv5tejl 616014674 systeembeheer kp1375 gprs NULL 8 NULL     NULL NULL NULL NULL NULL NULL NULL NULL NULL NULL 2 SNMP::Info NULL 2025-05-28 12:12:23.939332 2025-05-18 00:35:33.984641 2025-05-18 00:05:04.945184 80001f88808e00774a77e2d867   FALSE FALSE {} {} NULL
10.92.24.254 2023-07-21 23:32:48.567829 NULL NULL NULL   NULL   NULL 0 NULL     NULL NULL NULL NULL NULL NULL NULL NULL NULL NULL 2 SNMP::Info   2025-04-21 07:09:33.985884 2025-04-21 04:42:25.993581 2025-04-21 04:07:16.578891     FALSE FALSE {} {} server
10.98.4.100 2022-09-15 15:19:46.555838 HiveAP215.aquafinad.aquafin.be NULL NULL   NULL   NULL 0 NULL     NULL NULL NULL NULL NULL NULL NULL NULL NULL NULL 2 SNMP::Info NULL 2025-02-19 15:06:34.190048 2025-02-19 12:40:34.136221 2025-02-19 12:06:08.137279     FALSE FALSE {} {} NULL
10.98.13.134 2019-05-08 15:05:46.75633 HiveAP62.aquafinad.aquafin.be NULL NULL   NULL   NULL 0 NULL     NULL NULL NULL NULL NULL NULL NULL NULL NULL NULL 2 SNMP::Info NULL 2025-04-09 15:06:34.950064 2025-04-09 12:40:00.607325 2025-04-09 12:06:22.478934     FALSE FALSE {} {} NULL
10.98.14.201 2018-12-05 11:25:20.722678 NULL NULL NULL   NULL   NULL 0 NULL     NULL NULL NULL NULL NULL NULL NULL NULL NULL NULL 2 SNMP::Info NULL 2025-03-12 11:07:25.456632 2025-03-12 08:42:44.245381 NULL     FALSE FALSE {} {} NULL
10.98.26.112 2025-05-27 19:11:08.565086 LA9005.aquafinad.aquafin.be AP305C, IQ Engine 10.7r5b build-c6be9b0 574402805 [email protected] NULL Grimbergen_vestiging|Grimbergen_verdiep NULL 19 NULL     NULL NULL NULL NULL NULL NULL NULL NULL NULL NULL 2 SNMP::Info NULL 2025-05-28 07:05:28.321039 NULL NULL 800069300390b832356c00   FALSE FALSE {} {} NULL
10.98.50.23 2023-11-09 15:05:22.708079 HiveAP214.aquafinad.aquafin.be NULL NULL   NULL   NULL 0 NULL     NULL NULL NULL NULL NULL NULL NULL NULL NULL NULL 2 SNMP::Info NULL 2025-04-18 15:08:41.710987 2025-03-28 08:38:24.879655 2025-03-28 08:05:21.752747     FALSE FALSE {} {} NULL
10.98.65.119 2024-11-20 15:42:48.413564 cbs250-8p-wulpen-rwzi-admin.aquafinad.aquafin.be NULL NULL   NULL   NULL 0 NULL     NULL NULL NULL NULL NULL NULL NULL NULL NULL NULL 2 SNMP::Info NULL 2025-04-01 15:05:51.77095 2025-04-01 12:35:55.339303 2025-04-01 12:06:19.943375     FALSE FALSE {} {} NULL
10.104.2.2 2024-11-05 11:41:45.055762 NULL NULL NULL   NULL   NULL 0 NULL     NULL NULL NULL NULL NULL NULL NULL NULL NULL NULL 2 SNMP::Info NULL 2025-02-12 15:07:50.53451 NULL 2025-02-12 12:06:41.230519     FALSE FALSE {} {} NULL
10.105.2.1 2025-04-16 10:19:45.836725 NULL NULL NULL   NULL   NULL 0 NULL     NULL NULL NULL NULL NULL NULL NULL NULL NULL NULL 2 SNMP::Info NULL 2025-04-16 11:10:36.606144 2025-04-16 10:19:51.126495 2025-04-16 10:19:51.154512     FALSE FALSE {} {} NULL

as you said it also seems that it never goes higher as "1".

a few other observations:

in the "recently added devices" report they show up if it's in the datarange - no matter how many (so more than 1 is sometimes visible there)

in the "device inventory" report only 1 device will show up, which just seems to be the first one returned by the db. none of the others show up , even when looking for their ips.

and "inventory by model by os" gives you this nice blank line at the bottom of the page :)

Image

inphobia avatar May 28 '25 11:05 inphobia

hm; for some reason my expire job is no longer running via the scheduler, which alrdy explains a bit of why those devices didn't go away by themselves.

41778374	"2025-05-26 02:30:00.806442"	"2025-05-26 02:30:01"	"2025-05-26 02:30:01"			"expire"		"error"						"linux005.aquafin.be"
41799652	"2025-05-27 02:30:02.828036"	"2025-05-27 02:30:03"	"2025-05-27 02:30:03"			"expire"		"error"						"linux005.aquafin.be"
41821024	"2025-05-28 02:30:42.829958"	"2025-05-28 02:30:43"	"2025-05-28 02:30:43"			"expire"		"error"						"linux005.aquafin.be"

but when running expire via netdisco-do does work.....

inphobia avatar May 28 '25 11:05 inphobia

We have the same issue. Haven't found the source yet. Occasionally I delete the incorrect devices from the database with the SQL delete from device where model = '' and os is null;.

my hunch is that some of the devices just don't answer fast enough for snmp::info to try & define a class for them (mobile routers on 220ms latency & 0.5% - 2% packet loss for example) a device that just falls under the snmp::info class and, like in the example from my database a bit higher, have "NULL" in most of the database fields i'd rather just ignore. but as flexible as netdisco's acl's are i haven't found a way to disallow "snmp::info".

inphobia avatar May 28 '25 12:05 inphobia

That's a fair hunch, @inphobia !

(assuming we can't just slow down net-snmp with longer timeouts...) Would it help to have some setting in Netdisco that aborts a discovery if:

  • the device class is SNMP::Info (because you know it must be something else) or
  • if certain fields like os or os_ver aren't returned (when you know they should be).

This is a bit like https://github.com/netdisco/netdisco/wiki/Configuration#snmp_field_protection only the other way around.

ollyg avatar May 28 '25 12:05 ollyg

(the settings would be based on ACL of course to select the devices by IP)

ollyg avatar May 28 '25 12:05 ollyg

hm; for some reason my expire job is no longer running via the scheduler, which alrdy explains a bit of why those devices didn't go away by themselves.

41778374	"2025-05-26 02:30:00.806442"	"2025-05-26 02:30:01"	"2025-05-26 02:30:01"			"expire"		"error"						"linux005.aquafin.be"
41799652	"2025-05-27 02:30:02.828036"	"2025-05-27 02:30:03"	"2025-05-27 02:30:03"			"expire"		"error"						"linux005.aquafin.be"
41821024	"2025-05-28 02:30:42.829958"	"2025-05-28 02:30:43"	"2025-05-28 02:30:43"			"expire"		"error"						"linux005.aquafin.be"

but when running expire via netdisco-do does work.....

and postgres logs this at the same time as the failing expire jobs:

2025-05-28 02:30:43.771 CEST netdisco netdisco [875426]ERROR:  invalid input syntax for type inet: "linux005.aquafin.be"
2025-05-28 02:30:43.771 CEST netdisco netdisco [875426]CONTEXT:  unnamed portal parameter $3 = '...'
2025-05-28 02:30:43.771 CEST netdisco netdisco [875426]STATEMENT:  INSERT INTO user_log ( details, event, userip, username) VALUES ( $1, $2, $3, $4 )

(dns resolving - forward&reverse, is working btw)

inphobia avatar May 28 '25 12:05 inphobia

That's a fair hunch, @inphobia !

(assuming we can't just slow down net-snmp with longer timeouts...) Would it help to have some setting in Netdisco that aborts a discovery if:

* the device class is SNMP::Info (because you know it _must_ be something else) or

* if certain fields like `os` or `os_ver` aren't returned (when you know they should be).

This is a bit like https://github.com/netdisco/netdisco/wiki/Configuration#snmp_field_protection only the other way around.

ah yeah, good point: i'm currently running with an snmptimeout of 8200000 (so about 3 times higher as the default), no bulkwalk for about half my devices, and workers timeout set to 630. (32 workers). it's a balancing act between not overloading the server when devices answer quickly and not having 1500 jobs in the queue trying to poll mobile routers. (i also have disabled snmp field protection for serials, they get replaced way to often)

we have around 2000-2500 mobile routers, at any given time you can expect at least 20 - 30 to be offline, and lets say between 100 and 200 of them with 1% packet loss & 150ms+ latency. trying to predict which ones will have issues at a given time is futile, even more so with the 3g sunset. or the spectrum is overloaded & we get kicked to 2g. or the coax cable / antenna gets stolen, you name it.

ideally it would be great to ignore the output for a device if it's already known but due to timeouts gets moved into snmp::info base. otoh i'd also like to retry devices every few weeks to see if they got their act together.

come to think of it, something like the snmp connect failure accounting we already but retooled for devices that identify as "snmp::info" might be what i am looking for.

inphobia avatar May 28 '25 13:05 inphobia

Hi @inphobia well, interestingly snmp_field_protection will work for the device class! Sorry it has such a bad name. You can use "snmp_class" in the config (instead of "serial" which you disabled).

That way, it will be sticky. Unfortunately you can't prevent an initial value of "SNMP::Info" being stored, though (at least, off the top of my head right now... I may think of a way).

Also for trying devices after a bit of time, Netdisco should retry any device with too many deferrals after one week (retry_after), and also when the backend is restarted it retries every device once which was being held due to SNMP failures.

ollyg avatar May 28 '25 13:05 ollyg

(although, it seems SNMP field protection is a cancel, not a defer, so I'll need to check the behaviour on that)

ollyg avatar May 28 '25 13:05 ollyg

We have the same issue. Haven't found the source yet. Occasionally I delete the incorrect devices from the database with the SQL delete from device where model = '' and os is null;.

my hunch is that some of the devices just don't answer fast enough for snmp::info to try & define a class for them (mobile routers on 220ms latency & 0.5% - 2% packet loss for example) a device that just falls under the snmp::info class and, like in the example from my database a bit higher, have "NULL" in most of the database fields i'd rather just ignore. but as flexible as netdisco's acl's are i haven't found a way to disallow "snmp::info".

That's an interesting hunch. Could be one way to end up with these entries. I don't think that it's the case with my entries, though.

The majority of entries that I keep getting (and deleting) with model = '' and os is null and snmp_class = 'SNMP::Info' are Juniper Mist accesspoints. Their IP addresses (advertised via LLDP) are in one of the IP ranges in discover_only, but they should be skipped IMHO because:

  • discover_waps is set to false
  • they explicitly advertise 'wlanAccessPoint' capability through LLDP

What I see though, when I discover a switch with these APs connected, is that on one hand Netdisco logs that they qill be skipped, , but then they are queued after all:

[1480908] 2025-06-03 09:34:32 debug  [172.31.136.26] neigh - 172.31.168.41 with ID [70:90:41:01:20:20] on F17
[1480908] 2025-06-03 09:34:32 debug is_discoverable: 172.31.168.41 matches wap_platforms but discover_waps is not enabled
[1480908] 2025-06-03 09:34:32 debug  [172.31.136.26] neigh - skip: 172.31.168.41 of type [Mist Systems 802.11ax Access Point.] excluded by discover_* config
[1480908] 2025-06-03 09:34:32 debug  [172.31.136.26] neigh - 172.31.168.47 with ID [00:3e:73:16:e5:fe] on F18
[1480908] 2025-06-03 09:34:32 debug is_discoverable: 172.31.168.47 matches wap_platforms but discover_waps is not enabled
[1480908] 2025-06-03 09:34:32 debug  [172.31.136.26] neigh - skip: 172.31.168.47 of type [Mist Systems 802.11ax Access Point.] excluded by discover_* config
[1480908] 2025-06-03 09:34:32 debug  [172.31.136.26] neigh - 172.31.168.41 with ID [70:90:41:01:20:20] on F17
[1480908] 2025-06-03 09:34:32 debug  [172.31.136.26] neigh - 172.31.168.47 with ID [00:3e:73:16:e5:fe] on F18
[1480908] 2025-06-03 09:34:32 debug  [172.31.136.26] queue - queued 172.31.168.41 for discovery (ID: [70:90:41:01:20:20])
[1480908] 2025-06-03 09:34:32 debug  [172.31.136.26] queue - queued 172.31.168.47 for discovery (ID: [00:3e:73:16:e5:fe])

...and even though they do not respond to SNMP, still their IPs end up in the netdisco.device table - without an SNMP community, with SNMP::Info as snmp_class, empty model and null os.

JeroenvIS avatar Jun 03 '25 09:06 JeroenvIS

Hi @JeroenvIS that's quite a bug! Can you share the output of trying 'netdisco-do discover -D' (debug discover) on one of the Mist APs (after deleting the crappy db entry) ? Does it succeed at all?

ollyg avatar Jun 03 '25 10:06 ollyg

@ollyg sorry, I completeley missed your follow up.

Here's the requested output, a discover for one of the APs while its IP is not in netdisco.device table yet:

netdisco@linux786:~$ netdisco-do discover -D -d 172.31.173.198
[3609338] 2025-08-27 09:45:58  info App::Netdisco version 2.080003 loaded.
[3609338] 2025-08-27 09:45:58  info discover: [172.31.173.198] started at Wed Aug 27 11:45:58 2025
[3609338] 2025-08-27 09:45:59 debug discover: running with timeout 600s
[3609338] 2025-08-27 09:45:59 debug //// CHECK \\\\ phase
[3609338] 2025-08-27 09:45:59 debug ⮕ worker Internal::BackendFQDN p1000000
[3609338] 2025-08-27 09:45:59 debug ⮕ worker Internal::SNMPFastDiscover p1000000
[3609338] 2025-08-27 09:45:59 debug running with configured SNMP timeouts
[3609338] 2025-08-27 09:45:59 debug ⮕ worker Discover p0
[3609338] 2025-08-27 09:45:59 debug ⬅ (done) Discover is able to run.
[3609338] 2025-08-27 09:45:59 debug //// EARLY \\\\ phase
[3609338] 2025-08-27 09:45:59 debug ⮕ worker Discover::Properties p100
[3609338] 2025-08-27 09:45:59 debug snmp reader cache warm: [172.31.173.198]
[3609338] 2025-08-27 09:45:59 debug [172.31.173.198:161] try_connect with v: 2, t: 0.2, r: 0, class: SNMP::Info, comm: <hidden>
[3609338] 2025-08-27 09:45:59 debug [172.31.173.198:161] try_connect with v: 2, t: 0.2, r: 0, class: SNMP::Info, comm: <hidden>
[3609338] 2025-08-27 09:46:00 debug [172.31.173.198:161] try_connect with v: 2, t: 0.2, r: 0, class: SNMP::Info, comm: <hidden>
[3609338] 2025-08-27 09:46:01 debug [172.31.173.198:161] try_connect with v: 2, t: 0.2, r: 0, class: SNMP::Info, comm: <hidden>
[3609338] 2025-08-27 09:46:01 debug [172.31.173.198:161] try_connect with v: 2, t: 0.2, r: 0, class: SNMP::Info, comm: <hidden>
[3609338] 2025-08-27 09:46:02 debug [172.31.173.198:161] try_connect with v: 2, t: 0.2, r: 0, class: SNMP::Info, comm: <hidden>
[3609338] 2025-08-27 09:46:02 debug [172.31.173.198:161] try_connect with v: 2, t: 0.2, r: 0, class: SNMP::Info, comm: <hidden>
[3609338] 2025-08-27 09:46:03 debug [172.31.173.198:161] try_connect with v: 2, t: 0.2, r: 0, class: SNMP::Info, comm: <hidden>
[3609338] 2025-08-27 09:46:04 debug [172.31.173.198:161] try_connect with v: 2, t: 0.2, r: 0, class: SNMP::Info, comm: <hidden>
[3609338] 2025-08-27 09:46:04 debug [172.31.173.198:161] try_connect with v: 2, t: 0.2, r: 0, class: SNMP::Info, comm: <hidden>
[3609338] 2025-08-27 09:46:05 debug [172.31.173.198:161] try_connect with v: 2, t: 0.2, r: 0, class: SNMP::Info, comm: <hidden>
[3609338] 2025-08-27 09:46:05 debug [172.31.173.198:161] try_connect with v: 2, t: 0.2, r: 0, class: SNMP::Info, comm: <hidden>
[3609338] 2025-08-27 09:46:06 debug [172.31.173.198:161] try_connect with v: 2, t: 0.2, r: 0, class: SNMP::Info, comm: <hidden>
[3609338] 2025-08-27 09:46:07 debug [172.31.173.198:161] try_connect with v: 2, t: 0.2, r: 0, class: SNMP::Info, comm: <hidden>
[3609338] 2025-08-27 09:46:07 debug [172.31.173.198:161] try_connect with v: 2, t: 0.2, r: 0, class: SNMP::Info, comm: <hidden>
[3609338] 2025-08-27 09:46:08 debug [172.31.173.198:161] try_connect with v: 2, t: 0.2, r: 0, class: SNMP::Info, comm: <hidden>
[3609338] 2025-08-27 09:46:08 debug [172.31.173.198:161] try_connect with v: 2, t: 0.2, r: 0, class: SNMP::Info, comm: <hidden>
[3609338] 2025-08-27 09:46:09 debug [172.31.173.198:161] try_connect with v: 2, t: 0.2, r: 0, class: SNMP::Info, comm: <hidden>
[3609338] 2025-08-27 09:46:10 debug [172.31.173.198:161] try_connect with v: 2, t: 0.2, r: 0, class: SNMP::Info, comm: <hidden>
[3609338] 2025-08-27 09:46:10 debug [172.31.173.198:161] try_connect with v: 2, t: 0.2, r: 0, class: SNMP::Info, comm: <hidden>
[3609338] 2025-08-27 09:46:11 debug [172.31.173.198:161] try_connect with v: 2, t: 3, r: 2, class: SNMP::Info, comm: <hidden>
[3609338] 2025-08-27 09:46:38 debug [172.31.173.198:161] try_connect with v: 2, t: 3, r: 2, class: SNMP::Info, comm: <hidden>
[3609338] 2025-08-27 09:47:05 debug [172.31.173.198:161] try_connect with v: 2, t: 3, r: 2, class: SNMP::Info, comm: <hidden>
[3609338] 2025-08-27 09:47:32 debug [172.31.173.198:161] try_connect with v: 2, t: 3, r: 2, class: SNMP::Info, comm: <hidden>
[3609338] 2025-08-27 09:47:59 debug [172.31.173.198:161] try_connect with v: 2, t: 3, r: 2, class: SNMP::Info, comm: <hidden>
[3609338] 2025-08-27 09:48:26 debug [172.31.173.198:161] try_connect with v: 2, t: 3, r: 2, class: SNMP::Info, comm: <hidden>
[3609338] 2025-08-27 09:48:53 debug [172.31.173.198:161] try_connect with v: 2, t: 3, r: 2, class: SNMP::Info, comm: <hidden>
[3609338] 2025-08-27 09:49:20 debug [172.31.173.198:161] try_connect with v: 2, t: 3, r: 2, class: SNMP::Info, comm: <hidden>
[3609338] 2025-08-27 09:49:47 debug [172.31.173.198:161] try_connect with v: 2, t: 3, r: 2, class: SNMP::Info, comm: <hidden>
[3609338] 2025-08-27 09:50:14 debug [172.31.173.198:161] try_connect with v: 2, t: 3, r: 2, class: SNMP::Info, comm: <hidden>
[3609338] 2025-08-27 09:50:41 debug [172.31.173.198:161] try_connect with v: 2, t: 3, r: 2, class: SNMP::Info, comm: <hidden>
[3609338] 2025-08-27 09:51:08 debug [172.31.173.198:161] try_connect with v: 2, t: 3, r: 2, class: SNMP::Info, comm: <hidden>
[3609338] 2025-08-27 09:51:35 debug [172.31.173.198:161] try_connect with v: 2, t: 3, r: 2, class: SNMP::Info, comm: <hidden>
[3609338] 2025-08-27 09:52:02 debug [172.31.173.198:161] try_connect with v: 2, t: 3, r: 2, class: SNMP::Info, comm: <hidden>
[3609338] 2025-08-27 09:52:29 debug [172.31.173.198:161] try_connect with v: 2, t: 3, r: 2, class: SNMP::Info, comm: <hidden>
[3609338] 2025-08-27 09:52:56 debug [172.31.173.198:161] try_connect with v: 2, t: 3, r: 2, class: SNMP::Info, comm: <hidden>
[3609338] 2025-08-27 09:53:23 debug [172.31.173.198:161] try_connect with v: 2, t: 3, r: 2, class: SNMP::Info, comm: <hidden>
[3609338] 2025-08-27 09:53:50 debug [172.31.173.198:161] try_connect with v: 2, t: 3, r: 2, class: SNMP::Info, comm: <hidden>
[3609338] 2025-08-27 09:54:17 debug [172.31.173.198:161] try_connect with v: 2, t: 3, r: 2, class: SNMP::Info, comm: <hidden>
[3609338] 2025-08-27 09:54:44 debug [172.31.173.198:161] try_connect with v: 2, t: 3, r: 2, class: SNMP::Info, comm: <hidden>
[3609338] 2025-08-27 09:55:11 debug [172.31.173.198:161] try_connect with v: 1, t: 3, r: 2, class: SNMP::Info, comm: <hidden>
[3609338] 2025-08-27 09:55:38 debug [172.31.173.198:161] try_connect with v: 1, t: 3, r: 2, class: SNMP::Info, comm: <hidden>
[3609338] 2025-08-27 09:55:59 debug {}
[3609338] 2025-08-27 09:57:02 debug ⬅ (done) Ended discover for 172.31.173.198
[3609338] 2025-08-27 09:57:02 debug ⮕ worker Discover::Properties p100
[3609338] 2025-08-27 09:57:02 debug ⬅ (info)  [172.31.173.198] device - OK to continue discover (not a duplicate)
[3609338] 2025-08-27 09:57:02 debug ⮕ worker Discover::Properties p100
[3609338] 2025-08-27 09:57:11 debug ⬅ (info)  [172.31.173.198] device - OK to continue discover (valid interfaces)
[3609338] 2025-08-27 09:57:11 debug ⮕ worker Discover::Properties p100
[3609338] 2025-08-27 09:58:50 debug  resolving 0 aliases with max 50 outstanding requests
[3609338] 2025-08-27 09:58:50 debug  [172.31.173.198] device - removed 1 aliases
[3609338] 2025-08-27 09:58:50 debug ⬅ (info)  [172.31.173.198] aliases - added 1 new aliases and 0 subnets
[3609338] 2025-08-27 09:58:50 debug ⮕ worker Discover::Properties p100
[3609338] 2025-08-27 10:00:29 error  [172.31.173.198] interfaces - Error! Failed to get uptime from device!
[3609338] 2025-08-27 10:00:29 debug ⬅ (error) discover failed: no uptime from device 172.31.173.198!
[3609338] 2025-08-27 10:00:29 debug //// MAIN \\\\ phase
[3609338] 2025-08-27 10:00:29 debug ⮕ worker Discover::CanonicalIP p100
[3609338] 2025-08-27 10:00:29 debug ⮕ worker Discover::Entities p100
[3609338] 2025-08-27 10:00:29 debug  [172.31.173.198] modules - removed 1 chassis modules
[3609338] 2025-08-27 10:00:29 debug ⬅ (info)  [172.31.173.198] modules - 0 chassis components (added one pseudo for chassis)
[3609338] 2025-08-27 10:00:29 debug ⮕ worker Discover::Neighbors p100
[3609338] 2025-08-27 10:00:29 debug  [172.31.173.198] neigh - removed 0 outdated manual topology links
[3609338] 2025-08-27 10:00:29 debug  [172.31.173.198] neigh - setting manual topology links
[3609338] 2025-08-27 10:00:29 debug  [172.31.173.198] neigh - neighbor protocols are not enabled
[3609338] 2025-08-27 10:00:29 debug ⬅ (info)  [172.31.173.198] neigh - processed 0 neighbors
[3609338] 2025-08-27 10:00:29 debug ⮕ worker Discover::Neighbors::DOCSIS p100
[3609338] 2025-08-27 10:00:29 debug ⬅ (info)  [172.31.173.198] neigh - no modems (probably not a DOCSIS device)
[3609338] 2025-08-27 10:00:29 debug ⮕ worker PythonShim netdisco.worklet.discover.nexthopneighbors.main.cli.juniper_junos p200
[3609338] 2025-08-27 10:00:29 debug ⬅ (info) skip: acls restricted
[3609338] 2025-08-27 10:00:29 debug ⮕ worker Discover::NextHopNeighbors p100
[3609338] 2025-08-27 10:00:29 debug ⮕ worker Discover::PortPower p100
[3609338] 2025-08-27 10:00:29 debug ⬅ (info)  [172.31.173.198] power - 0 power modules
[3609338] 2025-08-27 10:00:29 debug ⮕ worker Discover::PortProperties p100
[3609338] 2025-08-27 10:00:38 debug  [172.31.173.198] resolving 0 remote_ips with max 50 outstanding requests
[3609338] 2025-08-27 10:00:47 debug ⬅ (info)  [172.31.173.198] no port properties to record
[3609338] 2025-08-27 10:00:47 debug ⮕ worker Discover::Properties::Tags p0
[3609338] 2025-08-27 10:00:47 debug ⮕ worker Discover::Properties::Tags p0
[3609338] 2025-08-27 10:00:47 debug ⮕ worker Discover::VLANs p100
[3609338] 2025-08-27 10:00:56 debug  [172.31.173.198] vlans - removed 0 port VLANs
[3609338] 2025-08-27 10:00:56 debug  [172.31.173.198] vlans - added 0 new port VLANs
[3609338] 2025-08-27 10:00:56 debug  [172.31.173.198] vlans - removed 0 device VLANs
[3609338] 2025-08-27 10:00:56 debug  [172.31.173.198] vlans - added 0 new device VLANs
[3609338] 2025-08-27 10:00:56 debug ⬅ (info)  [172.31.173.198] vlans - discovered for ports and device
[3609338] 2025-08-27 10:00:56 debug ⮕ worker Discover::Wireless p100
[3609338] 2025-08-27 10:00:56 debug ⮕ worker Discover::WithNodes p0
[3609338] 2025-08-27 10:00:56 debug //// STORE \\\\ phase
[3609338] 2025-08-27 10:00:56 debug ⮕ worker Discover::NextHopNeighbors p0
[3609338] 2025-08-27 10:00:56 debug //// LATE \\\\ phase
[3609338] 2025-08-27 10:00:56 debug ⮕ worker Discover::Hooks p0
[3609338] 2025-08-27 10:00:56 debug ⬅ (info)  [172.31.173.198] hooks - skipping due to incomplete job
[3609338] 2025-08-27 10:00:56 debug ⮕ worker Discover::Snapshot p0
[3609338] 2025-08-27 10:00:56 debug discover: timed out!
[3609338] 2025-08-27 10:00:56 debug ⬅ (error) job timed out after 600 sec
[3609338] 2025-08-27 10:00:56  info discover: finished at Wed Aug 27 12:00:56 2025
[3609338] 2025-08-27 10:00:56  info discover: status error: discover failed: no uptime from device 172.31.173.198!
netdisco@linux786:~$

By the way, as a more permanent fix, I added '(?i)Access.+Point' to discover_no_type. So our discover_no_type list now looks like this:

discover_no_type:
  - '(?i)phone'
  - '(?i)(?:wap|wireless)'
  - 'cisco\s+AIR-LAP'
  - 'AIR-AP\d+'
  - '(?i)AP1G\d'
  - '(?i)AP3G\d'
  - '(?i)IP\s+Phone'
  - '(?i)Access.+Point'

JeroenvIS avatar Aug 27 '25 12:08 JeroenvIS

Thanks @JeroenvIS ! Wow, there's a lot going on here :-)

Am interested to see the timeout happen on the job and can see why... and things that could be improved/addressed in Netdisco design.

The 09:55:59 debug {} message is interesting and suspicious to me... @JeroenvIS have you added any of your own debug messaging to the source? (just to check before looking into it)

The order of some of the workers is a bit wrong in my opinion and can be tweaked which might address some of the problem.

Clearly some data is added to the database even after the job is rejected due to no uptime... needs looking into.

So thanks, plenty to dig into there! I'm glad you have a fix though in the meantime :)

ollyg avatar Aug 27 '25 12:08 ollyg

Thanks @JeroenvIS ! Wow, there's a lot going on here :-)

Yes, there's always a lot going on in our network ;-)

The 09:55:59 debug {} message is interesting and suspicious to me... @JeroenvIS have you added any of your own debug messaging to the source? (just to check before looking into it)

As far as I know, we don't have local changes for debugging.

JeroenvIS avatar Aug 28 '25 11:08 JeroenvIS

OK this issue should be fixed in release 2.088002

The above debug was priceless @JeroenvIS - I found FIVE issues to fix in netdisco (at least two, maybe three of which were affecting you). I hope now that there are no more ghost devices and things run smoothly.

Closing the ticket but let me know if anything appears wonky still after the upgrade.

(one side effect which will be in the release notes is that Netdisco pins devices a bit more tightly to SNMPv2 even when you try to migrate to SNMPv3 by adding details to device_auth - in which case no/only ACL should be used on the v2 device_auth stanza before device rediscovery to block it)

ollyg avatar Aug 31 '25 18:08 ollyg