Note for design: Log customization and tuning
I've gotten feedback that we should take special care around customization of log output.
Currently, we only really offer the supported log level, and don't have specific support for specifying the log level of our dependencies including pingora.
As logging is a primary "user interface" for River, we should make another design pass on the desired capabilities and options for logging.
Some additional suggested features include:
- reducing the percentage of logs that are printed, allowing for statistical sampling of high-volume logs, e.g. tracing 1%/10%/100% of all connections verbosely
- tuning the contents and format of primary logs, allowing for different users to include different information relevant to their use cases
- Filtering and anonymization of logs, for example not logging headers based on some sort of regex (to remove PII), or allowing for hashed versions of logs, such as logging the hash of the entire header, allowing for observation of duplicate requests
CC @branlwyd
Also note: we should make sure we have setting for consuming structured logs, like OpenTelemetry, compat with Loki, Prometheus, etc.
I hope to support full customization through placeholders, instead of adding, deleting, or renaming fields to pre-set JSON like in Caddy. Therefore, there is no need to worry about whether CLF or JSON format is better, allowing users to fully customize it
Additionally, it is necessary to support log rotated like Caddy