Fleet returns 500 error after carve request timeout.
Fleet version: v4.41.1 osquery version 5.11.0
💥 Actual behavior
When osquery sends carve blocks, Fleet validates that blocks are received in the correct order. Generally, this works smoothly as osquery always sends the blocks sequentially.
When a customer attempted to carve log files in order to diagnose connectivity issues in fleetd, Fleet began to generate alerts for 500 errors for the osquery/block endpoint.
This was the error in Fleet (with varying values for id):
{"carve_id":180600,"component":"http","err":"validate carve block: block_id does not match expected block (1): 4"}
When reviewing incoming traffic from affected hosts, each of these errors was preceded by a request that failed due to a client timeout.
I found this osquery ticket, indicating that the carver will continue to send requests regardless of previous failures:
https://github.com/osquery/osquery/issues/6742
In reviewing the implementation, this still seems to be the case:
https://github.com/osquery/osquery/blob/2bd7e8660881b2863811a04e34f8ff3d2d748a3d/osquery/carver/carver.cpp#L334
If this is expected behavior for osquery, Fleet should not return a 500 error when carve blocks are not sent in order.
🧑💻 Steps to reproduce
- TODO
- TODO
🕯️ More info (optional)
N/A
@ksatter We will assume that the fix is to replace the 500 with 4xx. The root-cause should be fixed in osquery core but it looks like an old bug that does not get attention.
100% in agreement
Carve blocks out of sync, Fleet's calm response brings peace, Errors no longer speak.
Carve without worry, Order shifts, Fleet stays serene, Clear path in the clouds.