Policy failure count mismatch on host details page
Fleet version: 4.64.2
💥 Actual behavior
On the host's details page, the Policies tab and Issues counter is counting 59 policy failures, but the warning message on the tab says the device is only failing 9. Device is truly failing only 9 policies.
🧑💻 Steps to reproduce
- TODO
- TODO
🕯️ More info (optional)
N/A
I can reproduce this by manually updating my host_issues table, which is what drives the badge on the tab and the tooltip. The "This device is failing xxx policies" note is driven from the policies list retrieved from the host details API, which seems more up-to-date. The host_issues table is updated hourly by default.
Hey team! Please add your planning poker estimate with Zenhub @dantecatalfamo @jacobshandling @sgress454
I'll go ahead and source all 3 of these counts from the host.policies array per Scott's breakdown above
The above UI-only solution brings up a related issue - since a host's "total issues count" and its "critical vulnerabilities count" both still reference the slow-to-update host_issues table values, just sourcing "Failing policies" count from the more recently updated data leaves a discrepancy between these numbers, since total issues count should be the sum of the other two.
Because of this it seems like this should be a backend fix, where GET ing a host's details both calculates, returns, and writes to the DB the host's updated values.
Note discrepancy here:
This makes sense--if we don't already have the information we need on the front end, then we need to get it from the API. I don't know how costly that is, but if we were only doing it once an hour it seems likely that it's not something we want to do at great frequency. We should consider gating this behind an API param.
But potentially not as costly to just calculate for one host at a time as needed per request to hosts/:id right? I'd think the once/hour is probably to account for doing it at the scale of all hosts
But potentially not as costly to just calculate for one host at a time as needed per request to
hosts/:idright? I'd think the once/hour is probably to account for doing it at the scale of all hosts
We're talking about running UpdateHostIssuesFailingPolicies() when this API is hit, right? Having a GET request have any side effects like this is playing with fire -- the expectation is that these can be scripted and/or hammered (even if that expectation isn't completely fair). So if that's what we're talking about, we want to be very careful about it. It would be good to, for example, first check the updated_at of the row in host_issues so that we only do this max once per minute.
first check the updated_at of the row in host_issues so that we only do this max once per minute
I see, thanks for the great insight. Will add this to the PR.
Mismatched count, gone, Fleet's truth shines like morning sun. Clear as glass city.