fleet Add option to customize query report cap

Goal

User story
As a Fleet admin,
I want to customize the result cap for query reports
so that I can see results for all of my hosts.

Context

Requestor(s): @rachaelshaw
Product designer: @rachaelshaw

Currently, for queries that run on >1000 hosts, query reports in the Fleet UI serve as previews of the data returned, rather than true reports of the latest results. (Those users need to send data to a log destination in order to build a complete up-to-date report, since reports in Fleet are clipped at 1,000.)

Changes

Product

[ ] UI changes: Figma
[ ] CLI usage changes: TODO
[ ] REST API changes: TODO
[ ] Permissions changes: TODO
[ ] Outdated documentation changes: Assuming there’s 20 queries running on all hosts that return 1 result per host, what resources do we recommend? Best practice: Set the cap to the level that’s right above the # of total hosts. Why? Current use case we understand is to add custom info to the host details page for each host. Think “is the CrowdStrike Falcon agent healthy or not?”
[ ] Changes to paid features or tiers: TODO

Engineering

[ ] Database schema migrations: TODO
[ ] Load testing: TODO

ℹ️ Please read this issue carefully and understand it. Pay special attention to UI wireframes, especially "dev notes".

QA

Risk assessment

Requires load testing: TODO
Risk level: Low / High TODO
Risk description: TODO

Manual testing steps

Step 1
Step 2
Step 3

Testing notes

Confirmation

[ ] Engineer (@____): Added comment to user story confirming successful completion of QA.
[ ] QA (@____): Added comment to user story confirming successful completion of QA.

Dec 15 '23 18:12 rachaelshaw

Heads up @rachaelshaw this request was discussed during feature fest last week and didn't make it into the current design sprint.

Jan 10 '24 14:01 noahtalerman

@nonpunctual I'm adding your tag to this as it has come up again. Feel free to remove.

Apr 24 '24 16:04 ksatter

Almost ready to call this settled. One more thing we'd like to get feedback on is how we handle existing reports when the query report cap is adjusted.

The ideal way would be to only delete results from reports that don't fit with the updated cap: Screenshot 2024-05-16 at 11 36 06 AM

A simpler way would be to just delete all query results if the setting is changed: Screenshot 2024-05-16 at 11 36 17 AM

@mostlikelee since you worked on the original feature, do you think you could give a rough idea of how much work it would be to handle this edge case the nicer way vs. the deleting-everything way? Here's the Figma for more context (happy to answer any questions at standup if any of this isn't clear)

May 16 '24 16:05 rachaelshaw

related: https://github.com/fleetdm/fleet/issues/7766 https://github.com/fleetdm/fleet/issues/15381 https://github.com/fleetdm/fleet/issues/11492

May 16 '24 16:05 nonpunctual

BE: 8 FE: 2-3

May 29 '24 18:05 sharon-fdm

Hey @rachaelshaw, are the TODOs in the product section still TODOs?

Also, which version did we end up estimating? The original version or alternate version.

Jun 03 '24 18:06 noahtalerman

Hey @rachaelshaw, are the TODOs in the product section still TODOs?

Just the REST API (configuration endpoints); updated the description. (I can make a draft PR as well once the changes to the modify configuration formatting are merged in)

Also, which version did we end up estimating? The original version or alternate version.

@noahtalerman I think the original version, although we kind of glossed over that in estimation— got the impression that part of it didn't have a huge impact on the estimate. Think I should move it to scratchpad, or leave it in case it does end up making a difference?

Jun 03 '24 19:06 rachaelshaw

The query report cap was divided into 2 separate issues: this one for the UI and #19600 for the config (which will ship first).

Jun 17 '24 21:06 rachaelshaw

FYI @mostlikelee and @rachaelshaw, I moved our convo from design review below for safekeeping.

Noah: Query reports solve the live query on all hosts problem. Noah: Something new for custom columns on the Hosts page. New kind of query.

Tim: When you send a query to all hosts and they’re all online then the UI freezes up for a couple minutes Tim: This gets compounded when you save multiple queries in quick succession Tim: osquery perf might not act like real-life osquery works. Osquery might spread out queries on a specific host

Noah: What about the Fleet server?

Tim: We could do extra work to spread out the results

Noah: So we could sink a bunch of time into improving the Fleet database and spreading out processing and doing testing with real osquery and still not get to 100,000 results.

Tim: Using Elastic Search works 50,000k hosts and 30 queries (1 hr interval). Each query returns one result. One query that returns 10 results per host.

Cost questions
New infra requirements = maintenance for cloud and self-managed users

Noah: Ignoring the cost, sounds great for managed cloud

Tim: Could be a paid feature? Tim: Would be helpful to come up with frequency and number of queries to help narrow scope and cost analysis.

Noah: Start with 100 queries running every hour Noah: Have a minimum value for queries with reports? Noah: Have a max number of queries with reports on? Noah: Is Jamf every 24 hrs? Ask Brock

Tim: MySQL will be less performant for custom queries Tim: Extension attributes have to return a single string.

Jun 27 '24 13:06 noahtalerman

This didn't get designed within the 3-week drafting timeline, bringing back to Feature Fest

Jul 11 '24 14:07 rachaelshaw