vector icon indicating copy to clipboard operation
vector copied to clipboard

Ring buffer / backtrace event handling

Open ilinas opened this issue 3 years ago • 3 comments

A note for the community

  • Please vote on this issue by adding a 👍 reaction to the original issue to help the community and maintainers prioritize this request
  • If you are interested in working on this issue or have submitted a pull request, please leave a comment

Use Cases

We have a bunch of services that output very detailed verbose logs. But those are only ever useful if an error occurs, which means that when the system is functioning normally, we are collecting gigabytes and gigabytes of useless logs just in case.

What we would like to be able to do is to have the logs in a buffer that outputs nothing until a specific condition is met, then it dumps the contents of the buffer for further processing.

Attempted Solutions

Had some success with the reduce transform collecting multiple log messages into a single event. However that is not ideal, because it changes the format of the event and, in case of a complex event with multiple properties of metadata, it is not trivial to handle them.

Outputting the events verbatim is preferred, because no special handling is needed on the receiving side.

Proposal

I would imagine it would work in a similar manner to reduce transform with ends_when parameter. When ends_when is triggered, flush the buffer.

References

The feature is available in some form in multiple logging technologies:

  • https://github.com/odygrd/quill/wiki/6.-Backtrace-Logging
  • https://tersesystems.github.io/terse-logback/1.0.0/guide/ringbuffer/

The paper explaining ring buffer logging:

  • https://hillside.net/plop/plop2k/proceedings/Marick/Marick.pdf

Version

No response

ilinas avatar Nov 16 '22 14:11 ilinas

This is a neat idea, thanks for the suggestion @ilinas .

jszwedko avatar Nov 28 '22 22:11 jszwedko

I have done some thinking on how this could be implemented in Vector, and came up with the following. Please forgive me if some of this makes no sense, I am just getting familiarized with Vector source code and how it works internally.

Let's assume we have a gate transform like this. It has an initial state, which is either open or closed and an open and close conditions to change the state when a condition is met.

transforms:
  log_gate:
    type: gate
    inputs:
      - parse_logs
    state: closed
    open: '.message.level == "Error"'
    close: '.message.flag == "Close"'

When the gate is closed, it works just like a filter, all events are discarded.

                                   /|
|5|4|X|3|2|1|0|9|8|7|6|5|E|4|3|2|1| |
     ^                   ^        |/
     Close gate          Error

When the open condition is matched, e.g. an error message, the gate opens, and the messages are allowed to go through.

                                   /|
          |5|4|X|3|2|1|0|9|8|7|6|5|E|
                                  |/

When the close condition i matched, the gate is closed again.

                                   /|
                              |5|4|X|3|2|1|0|9|8|7|6|5|E|
                                  |/

All the messages between open and close inclusive are passed though, and the transform starts to act like a filter again.

                                   /|
                                  | |   ->   |X|3|2|1|0|9|8|7|6|5|E|
                                  |/

This is already useful in some cases, e.g. if you are filtering out debug logs by default, but want to let them pass through e.g. when a certain user logs in.

But what you would really want in most cases is to know what happened before the error. To do that, you'd need to keep the previous message in a ring buffer, and flush it when the open condition is met.

                                   /|
                                  | |   ->   |X|3|2|1|0|9|8|7|6|5|E|4|3|2|1|
                                  |/                               |Buffer | 

A simple naive implementation could just keep the messages in memory in a collection (vec_deque?) internally. However unlike a reduce transform, which only aggregates certain properties, this one would have to keep the events verbatim, which would have the same memory considerations as the sink buffers.

Speaking of which, Vector does have a buffering model already implemented, which maybe could be reused here? Although that would certainly violate the "buffer at the sink" principle.

ilinas avatar Apr 30 '24 10:04 ilinas

A simple version without a buffer could look something like this:

impl FunctionTransform for Gate {
    fn transform(&mut self, output: &mut OutputBuffer, event: Event) {
        let (open_gate, event) = match self.open.as_ref() {
            Some(condition) => {
                let (result, event) = condition.check(event);
                (result, event)
            }
            _ => (false, event)
        };

        let (close_gate, event) = match self.close.as_ref() {
            Some(condition) => {
                let (result, event) = condition.check(event);
                (result, event)
            }
            _ => (false, event)
        };

        if open_gate {
            self.state = GateState::Open;
            output.push(event);
        } else if close_gate {
            self.state = GateState::Closed;
            output.push(event);
        } else if self.state == GateState::Open {
            output.push(event);
        } else {
            emit!(GateEventDiscarded);
        }
    }
}

ilinas avatar Apr 30 '24 10:04 ilinas