add trivial execution-time profiling
What if you could ?profile=true on a query and get some numbers back? That'd be really cool.
We already have tracing/spans, but right now, those only generate any data if you have something set up for them to trace to. Add a fancy wrapper that lets us generate our own tracing data, and dump it into the request response, if ?profile=true.
We track wall-clock execution time, possible arbitrary KV pairs, and before/after heap amounts, plus maximum heap size (according to runtime) and number of GC runs.
I did this in another branch, but that isn't ready for merging yet, and I wanted it for other things, so here's a separate PR. I also added heap estimations to it in this branch. This would be helpful for things like dx and especially for interacting with the RowCache stuff (though there's no actual way for us to check real physical memory usage, because of mmap).
A note I should make so it doesn't get lost: Some testing suggests that this has insane runtime performance costs in some cases. Like, doing a couple hundred queries against 2,000 fragments, the runtime for a non-profiled request with hot cache was about 1.5 seconds, the runtime for the same profiled cache was about 1.5 minutes, and I want to know where that is before I merge this, because it might mean something's fundamentally broken with this design.