vector icon indicating copy to clipboard operation
vector copied to clipboard

Incorrectly checkpointing journald logs when unable to send to sink

Open rbishop opened this issue 3 years ago • 4 comments

A note for the community

  • Please vote on this issue by adding a 👍 reaction to the original issue to help the community and maintainers prioritize this request
  • If you are interested in working on this issue or have submitted a pull request, please leave a comment

Problem

We have vector collecting from systemd units via journald and forwarding to the CloudWatch Logs (CWL) sink. Sometimes we suspend servers for long periods of time and then resume the server later. When this happens the AWS credentials are no longer valid. Upon restarting the server Vector starts up and begins attempting to forward logs to CWL but the API calls fail and eventually Vector exits and is restarted by systemd.

After getting new credentials and restarting vector we don't receive any logs that were generated under the stale AWS credentials. My best guess is that vector incorrectly checkpoints the journald stream even though the logs never successfully upload to CWL.

Configuration

healthchecks.enabled = false

[sources.postgres]
type = "syslog"
mode = "unix"
path = "/run/postgres-audit.socket"

[sources.kernel]
type = "journald"
current_boot_only = true
include_matches = { _TRANSPORT = ["kernel"] }

[transforms.postgres_xform]
type = "remap"
inputs = [ "postgres" ]
source = """
.SYSLOG_IDENTIFIER = "postgres"
"""

[transforms.kernel_xform]
type = "remap"
inputs = [ "kernel" ]
source = """
. = {
  # SYSLOG_IDENTIFIER is used by sink to forward to the appropriate Cloudwatch Logs stream
  "SYSLOG_IDENTIFIER": "kernel",
  "boot_id": ._BOOT_ID,
  "message": .message
}
"""

[sources.pgbouncer]
type = "journald"
current_boot_only = false
include_units = [ "[email protected]" ]

[sources.sshd]
type = "journald"
current_boot_only = false
include_units = [ "sshd" ]

[sources.auditd]
type = "file"
include = [ "/var/log/audit/audit.log*" ]
read_from = "beginning"

[transforms.auditd_xform]
type = "remap"
inputs = [ "auditd" ]
source = """
. |= parse_key_value!(.message)
.SYSLOG_IDENTIFIER = "audit"
"""

[sinks.cloudwatch_pg_audit]
type = "aws_cloudwatch_logs"
inputs = [ "auditd_xform", "postgres_xform", "sshd", "pgbouncer", "kernel_xform" ]
create_missing_group = false
create_missing_stream = false
group_name = "zxcv"
stream_name = "asdf-{{ SYSLOG_IDENTIFIER }}"
region = "us-west-2"
encoding.codec = "json"

Version

vector 0.26.0 (x86_64-unknown-linux-gnu c6b5bc2 2022-12-05)

Debug Output

No response

Example Data

No response

Additional Context

No response

References

No response

rbishop avatar Jan 13 '23 01:01 rbishop

Hey @rbishop sorry to hear you're having problems. Can you please share your configuration?

spencergilbert avatar Jan 13 '23 13:01 spencergilbert

@spencergilbert added to the issue body

rbishop avatar Jan 13 '23 17:01 rbishop

Thanks! This behavior is seen on all of your journald sources? Is the file source watching auditd exhibiting the same?

spencergilbert avatar Jan 13 '23 20:01 spencergilbert

The file source is working properly.

rbishop avatar Jan 17 '23 17:01 rbishop

@rbishop I actually think this is correct behavior from Vector. According to the journald docs, the checkpointing happens after a read: https://vector.dev/docs/reference/configuration/sources/journald/#checkpointing

One way to overcome this issue is to use the acknowledgements feature in sinks, like in the aws_cloudwatch_logs sink: https://vector.dev/docs/reference/configuration/sinks/aws_cloudwatch_logs/#acknowledgements

Note: acknowledgements won't work for the syslog sink. There isn't much we can do there because checkpointing isn't supported by the socket interface.

davidhuie-dd avatar Jan 24 '23 22:01 davidhuie-dd

Closing this since this is expected behavior.

fuchsnj avatar Mar 09 '23 14:03 fuchsnj