vector icon indicating copy to clipboard operation
vector copied to clipboard

Update Syslog source to accept non UTF-8 encoding in syslog message

Open Neko-Follower opened this issue 1 year ago • 7 comments

A note for the community

  • Please vote on this issue by adding a 👍 reaction to the original issue to help the community and maintainers prioritize this request
  • If you are interested in working on this issue or have submitted a pull request, please leave a comment

Problem

Vector drops logs when encounter a syslog message with non UTF-8 characters. Can you add an option to replace non utf-8 characters with U+FFFD or allow passing non-UTF8 text as-is like a Promtail do.

Configuration

No response

Version

0.37.1-distroless-static

Debug Output

No response

Example Data

2024-05-08T07:35:43.209847Z DEBUG source{component_kind="source" component_id=rsyslog component_type=syslog}:connection{peer_addr=172.22.0.4:44600}: vector::sources::util::net::tcp: Accepted a new connection. peer_addr=172.22.0.4:44600

2024-05-08T07:35:44.293974Z ERROR source{component_kind="source" component_id=rsyslog component_type=syslog}:connection{peer_addr=172.22.0.4:44594}: vector::internal_events::codecs: Failed framing bytes. error=Unable to decode input as UTF8 error_code="decoder_frame" error_type="parser_failed" stage="processing" internal_log_rate_limit=true

2024-05-08T07:35:44.294029Z ERROR source{component_kind="source" component_id=rsyslog component_type=syslog}:connection{peer_addr=172.22.0.4:44594}: vector::internal_events::codecs: Internal log [Failed framing bytes.] is being suppressed to avoid flooding.

Additional Context

No response

References

No response

Neko-Follower avatar May 08 '24 08:05 Neko-Follower

Agreed, this could be modeled like the existing decoding.codec.json.lossy option which replaces invalid UTF-8 characters.

jszwedko avatar May 08 '24 13:05 jszwedko

We'd be happy to see a PR for this if someone is motivated! It should be a relatively straightforward change.

jszwedko avatar May 08 '24 13:05 jszwedko

I have the same problem with Fluent source

osas1111 avatar Jun 26 '24 17:06 osas1111

Hi, I'm interested in contributing to this!

kevinmingtarja avatar Aug 10 '24 03:08 kevinmingtarja

Hi, I'm interested in contributing to this!

Great! We'd be happy to review a PR. You can see https://github.com/vectordotdev/vector/pull/17628 as an example of when it was added to the JSON decoder.

jszwedko avatar Aug 12 '24 14:08 jszwedko

Hi @jszwedko, just to confirm, it seems like the lossy option for syslog has been added in this PR #17680.

Is there anything else missing from that PR that could be causing this bug?

I wrote a unit test for /lib/codecs/src/decoding/format/syslog.rs and was able to verify that SyslogDeserializer::default().parse() does replace the non UTF-8 characters with the replacement character.

kevinmingtarja avatar Aug 13 '24 08:08 kevinmingtarja

Ah, and so it was. I forgot this issue is about the syslog source rather than the syslog decoder. I think we still need to add the option to the source.

jszwedko avatar Aug 13 '24 14:08 jszwedko