Add option to customize query report cap
Goal
| User story |
|---|
| As a Fleet admin, |
| I want to customize the result cap for query reports |
| so that I can see results for all of my hosts. |
Context
- Requestor(s): @rachaelshaw
- Product designer: @rachaelshaw
Currently, for queries that run on >1000 hosts, query reports in the Fleet UI serve as previews of the data returned, rather than true reports of the latest results. (Those users need to send data to a log destination in order to build a complete up-to-date report, since reports in Fleet are clipped at 1,000.)
Changes
Product
- [ ] UI changes: Figma
- [ ] CLI usage changes: TODO
- [ ] REST API changes: TODO
- [ ] Permissions changes: TODO
- [ ] Outdated documentation changes: Assuming there’s 20 queries running on all hosts that return 1 result per host, what resources do we recommend? Best practice: Set the cap to the level that’s right above the # of total hosts. Why? Current use case we understand is to add custom info to the host details page for each host. Think “is the CrowdStrike Falcon agent healthy or not?”
- [ ] Changes to paid features or tiers: TODO
Engineering
- [ ] Database schema migrations: TODO
- [ ] Load testing: TODO
ℹ️ Please read this issue carefully and understand it. Pay special attention to UI wireframes, especially "dev notes".
QA
Risk assessment
- Requires load testing: TODO
- Risk level: Low / High TODO
- Risk description: TODO
Manual testing steps
- Step 1
- Step 2
- Step 3
Testing notes
Confirmation
- [ ] Engineer (@____): Added comment to user story confirming successful completion of QA.
- [ ] QA (@____): Added comment to user story confirming successful completion of QA.
Heads up @rachaelshaw this request was discussed during feature fest last week and didn't make it into the current design sprint.
@nonpunctual I'm adding your tag to this as it has come up again. Feel free to remove.
Almost ready to call this settled. One more thing we'd like to get feedback on is how we handle existing reports when the query report cap is adjusted.
The ideal way would be to only delete results from reports that don't fit with the updated cap:
A simpler way would be to just delete all query results if the setting is changed:
@mostlikelee since you worked on the original feature, do you think you could give a rough idea of how much work it would be to handle this edge case the nicer way vs. the deleting-everything way? Here's the Figma for more context (happy to answer any questions at standup if any of this isn't clear)
related: https://github.com/fleetdm/fleet/issues/7766 https://github.com/fleetdm/fleet/issues/15381 https://github.com/fleetdm/fleet/issues/11492
BE: 8 FE: 2-3
Hey @rachaelshaw, are the TODOs in the product section still TODOs?
Also, which version did we end up estimating? The original version or alternate version.
Hey @rachaelshaw, are the TODOs in the product section still TODOs?
Just the REST API (configuration endpoints); updated the description. (I can make a draft PR as well once the changes to the modify configuration formatting are merged in)
Also, which version did we end up estimating? The original version or alternate version.
@noahtalerman I think the original version, although we kind of glossed over that in estimation— got the impression that part of it didn't have a huge impact on the estimate. Think I should move it to scratchpad, or leave it in case it does end up making a difference?
The query report cap was divided into 2 separate issues: this one for the UI and #19600 for the config (which will ship first).
FYI @mostlikelee and @rachaelshaw, I moved our convo from design review below for safekeeping.
Noah: Query reports solve the live query on all hosts problem. Noah: Something new for custom columns on the Hosts page. New kind of query.
Tim: When you send a query to all hosts and they’re all online then the UI freezes up for a couple minutes Tim: This gets compounded when you save multiple queries in quick succession Tim: osquery perf might not act like real-life osquery works. Osquery might spread out queries on a specific host
Noah: What about the Fleet server?
Tim: We could do extra work to spread out the results
Noah: So we could sink a bunch of time into improving the Fleet database and spreading out processing and doing testing with real osquery and still not get to 100,000 results.
Tim: Using Elastic Search works 50,000k hosts and 30 queries (1 hr interval). Each query returns one result. One query that returns 10 results per host.
- Cost questions
- New infra requirements = maintenance for cloud and self-managed users
Noah: Ignoring the cost, sounds great for managed cloud
Tim: Could be a paid feature? Tim: Would be helpful to come up with frequency and number of queries to help narrow scope and cost analysis.
Noah: Start with 100 queries running every hour Noah: Have a minimum value for queries with reports? Noah: Have a max number of queries with reports on? Noah: Is Jamf every 24 hrs? Ask Brock
Tim: MySQL will be less performant for custom queries Tim: Extension attributes have to return a single string.
This didn't get designed within the 3-week drafting timeline, bringing back to Feature Fest