cockroach icon indicating copy to clipboard operation
cockroach copied to clipboard

log,allocatorimpl: log allocation messages to a separate log file

Open lidorcarmel opened this issue 3 years ago • 4 comments

The allocator logs are written to the main log file, which means that those messages sometimes spam the main log, and also, we can't increase verbosity much. Because of that most allocation logs are now under a vlog and therefore rarely used.

This PR moves all allocation logs to a separate channel.

The next step would be to move some log messages from vlog to be printed always, which should help debugging.

Note that some unit tests were relying on VEventf tracing, even when vmodule is not enabled, instead we now explicitly set vmodule in those tests.

Step 1 for https://github.com/cockroachdb/cockroach/issues/82836.

Release note: None

lidorcarmel avatar Aug 05 '22 20:08 lidorcarmel

This change is Reviewable

cockroach-teamcity avatar Aug 05 '22 20:08 cockroach-teamcity

cc @knz and @nvanbenschoten

lidorcarmel avatar Aug 10 '22 15:08 lidorcarmel

we probably don't want a KV channel now and later break that to 2 or 3 channels.

I don't know about that. What about using multiple separate structured event types, such that we can keep the option to do post-loggging filtering on event type?

knz avatar Aug 10 '22 21:08 knz

I think some splitting up of log files is worth it. I spend a lot of time just iterating on grep -v for most incidents, and even the KV logs have lots of information that isn't always relevant. I think of the allocator logging that is likely to be added like the pebble logs. Usually you don't want to see them, unless you suspect something is off with pebble or the allocator. So they shouldn't mix into the main file or even into a main KV file if it existed, as they would simply be too verbose.

The message related to replication changes (i.e. when the allocator actions are taken) are already fairly verbose, too. I think it makes sense to have a message about the decision in the main log (moving replica X away, or replica Y is now deleted) but not all of the intermediate steps - adding learner, sending snapshot, entering joint quorum, leaving joint quorum, etc. So maybe these messages could also go to the distribution channel together with more verbose info about the allocator.

We would then have

  • DEV
  • KV-DISTRIBUTION

It probably makes sense to split out KV as well (which raises questions about whether to split further into KV-SERVER and KV-CLIENT in light of these living on separate machines in multi-tenant settings which is supposed to be the new normal).

But I am a lot less sure about these things due to this affecting all customers. We can't really experiment with these splits, which is sort of sad. It would be nice if these logging channels could be "virtual" enough so that customers don't have to know about all of them. That said, I think KV-SERVER, KV-DISTRIBUTION, KV-CLIENT would be a justifiable split that would carry us for the foreseeable future. By default KV-{CLIENT,SERVER} would go into the main log file, and distribution would get its own file. Since the log channel is encoded in each message, CRL engs could use a (updated) merge-logs that lets you select the channels to watch.

(I hope my mental model of log channels is right).

tbg avatar Aug 11 '22 09:08 tbg

@knz in which cases would we create a channel without directing it to a separate file?

lidorcarmel avatar Aug 11 '22 15:08 lidorcarmel

in which cases would we create a channel without directing it to a separate file?

In the case we might want to advise a user to do this redirection during, say, a support case.

knz avatar Aug 11 '22 15:08 knz

FWIW: when multiple log channels are configured to route to the same file, the channel ID is still included in each event, so we can split the file afterwards.

knz avatar Aug 18 '22 10:08 knz

@tbg , to clarify, adding these messages means (to me) replacing all log.* with log.KvDistribution.* in those 4 files,

No, just the exact lines, since they are about data movement and are almost always consequences of allocation decisions. They have no context when they appear in the main log, if allocator decisions are elsewhere. Probably one could craft an argument about purism of the allocator logs but if it were my judgment call I'd move them over as well (though not a hill I'd die on).

is that what you had in mind? I don't think we would normally want a single file logging to more than one channel, but maybe this is naive, not sure.

I'm not sure about that either but I tend to think that sounds too rigid. We should base the channel based on what's being logged, not based on what file it is in, which often has additional constraints that can't always be sidestepped.

tbg avatar Aug 29 '22 11:08 tbg

thanks all! bors r+

lidorcarmel avatar Aug 30 '22 20:08 lidorcarmel

Build succeeded:

craig[bot] avatar Aug 30 '22 23:08 craig[bot]