log,allocatorimpl: log allocation messages to a separate log file
The allocator logs are written to the main log file, which means that those messages sometimes spam the main log, and also, we can't increase verbosity much. Because of that most allocation logs are now under a vlog and therefore rarely used.
This PR moves all allocation logs to a separate channel.
The next step would be to move some log messages from vlog to be printed always, which should help debugging.
Note that some unit tests were relying on VEventf tracing, even when vmodule is not enabled, instead we now explicitly set vmodule in those tests.
Step 1 for https://github.com/cockroachdb/cockroach/issues/82836.
Release note: None
cc @knz and @nvanbenschoten
we probably don't want a KV channel now and later break that to 2 or 3 channels.
I don't know about that. What about using multiple separate structured event types, such that we can keep the option to do post-loggging filtering on event type?
I think some splitting up of log files is worth it. I spend a lot of time just iterating on grep -v for most incidents, and even the KV logs have lots of information that isn't always relevant.
I think of the allocator logging that is likely to be added like the pebble logs. Usually you don't want to see them, unless you suspect something is off with pebble or the allocator. So they shouldn't mix into the main file or even into a main KV file if it existed, as they would simply be too verbose.
The message related to replication changes (i.e. when the allocator actions are taken) are already fairly verbose, too. I think it makes sense to have a message about the decision in the main log (moving replica X away, or replica Y is now deleted) but not all of the intermediate steps - adding learner, sending snapshot, entering joint quorum, leaving joint quorum, etc. So maybe these messages could also go to the distribution channel together with more verbose info about the allocator.
We would then have
- DEV
- KV-DISTRIBUTION
It probably makes sense to split out KV as well (which raises questions about whether to split further into KV-SERVER and KV-CLIENT in light of these living on separate machines in multi-tenant settings which is supposed to be the new normal).
But I am a lot less sure about these things due to this affecting all customers. We can't really experiment with these splits, which is sort of sad. It would be nice if these logging channels could be "virtual" enough so that customers don't have to know about all of them. That said, I think KV-SERVER, KV-DISTRIBUTION, KV-CLIENT would be a justifiable split that would carry us for the foreseeable future. By default KV-{CLIENT,SERVER} would go into the main log file, and distribution would get its own file. Since the log channel is encoded in each message, CRL engs could use a (updated) merge-logs that lets you select the channels to watch.
(I hope my mental model of log channels is right).
@knz in which cases would we create a channel without directing it to a separate file?
in which cases would we create a channel without directing it to a separate file?
In the case we might want to advise a user to do this redirection during, say, a support case.
FWIW: when multiple log channels are configured to route to the same file, the channel ID is still included in each event, so we can split the file afterwards.
@tbg , to clarify, adding these messages means (to me) replacing all log.* with log.KvDistribution.* in those 4 files,
No, just the exact lines, since they are about data movement and are almost always consequences of allocation decisions. They have no context when they appear in the main log, if allocator decisions are elsewhere. Probably one could craft an argument about purism of the allocator logs but if it were my judgment call I'd move them over as well (though not a hill I'd die on).
is that what you had in mind? I don't think we would normally want a single file logging to more than one channel, but maybe this is naive, not sure.
I'm not sure about that either but I tend to think that sounds too rigid. We should base the channel based on what's being logged, not based on what file it is in, which often has additional constraints that can't always be sidestepped.
thanks all! bors r+