Parsing stops when hitting UTF-8 replacement char
We are using mtail to parse some specific logs where sometimes there are replacement chars(�) in strings. It seems like when mtail hits a log line with this char, it stops parsing logs after this log line.
mtail version: mtail version 3.0.0-rc50
mtail program:
counter response_total by code
counter count_lines
const ONELINER /oneliner/
ONELINER {
/<result code=\"(?P<code>1000|1300|1301|1500|2002|2200|2201|2202|2302|2303|2306|2308|greeting)\">/ {
response_total[$code]++
}
}
/$/ {
count_lines++
}
Sample logs - non-working:
Nov 29 04:02:02.242 host.example.tld service INFO [18636C07-1:foobar:1.2.3.4] service.ioLoop 123 : Transaction took: 0.500 seconds
Nov 29 04:02:02.242 host.example.tld service INFO Loop(responseData -> oneliner) <result code="1000"> <contact:name>FOO�BAR LOG</contact:name>
Nov 29 04:02:02.242 host.example.tld service INFO [18636C07-1:foobar:1.2.3.4] service.ioLoop 123 : Transaction took: 0.500 seconds
Sample logs - working:
Nov 29 04:02:02.242 host.example.tld service INFO [18636C07-1:foobar:1.2.3.4] service.ioLoop 123 : Transaction took: 0.500 seconds
Nov 29 04:02:02.242 host.example.tld service INFO Loop(responseData -> oneliner) <result code="1000"> <contact:name>FOOSBAR LOG</contact:name>
Nov 29 04:02:02.242 host.example.tld service INFO [18636C07-1:foobar:1.2.3.4] service.ioLoop 123 : Transaction took: 0.500 seconds
Running with -one_shot, with replacement char, I get count_lines{prog="test"} 2and without it,count_lines{prog="test"} 3`
Is this a known problem and how can I get more info on why is it failing?
cc: @smrekarm12
This is definitely a bug, thanks for finding it!
The problem is probably with how the file reader handles UTF-8 runes, and probably it's getting stuck by not discarding this after failing to convert the rune.
Can you find the INFO logs for mtail when this happens? https://google.github.io/mtail/Troubleshooting.html#deployment-problems has the location of the INFO log.
I ran mtail with -logtostderr, -one_shot and -v=2 and the output is attached in log files.
Non-working log file: mtail_debug_nonworking.log
Working log file: mtail_debug_working.log
From the logs, I can only see the working one having one more decode.go:52] sendline log line and the non-working log is 2 bytes bigger, probably due to utf-8 rune.
Any news regarding this issue? Just so we know if we should find another solution or wait?
I can't reproduce this on the latest build:
jaq% ./mtail -progs eg.mtail -logs nonworking.log --one_shot --logtostderr --vmodule=decode=2 --one_shot_format=prometheus
I0527 13:16:08.084292 2760023 main.go:119] mtail version v3.0.1-58-ge7d279d7-dirty git revision e7d279d75b1b90b6f76e5e770d62e12ce651ecd1 go version go1.23-20240419-RC02 cl/626470163 +7f76c00fc5 X:fieldtrack,boringcrypto go arch amd64 go os linux
I0527 13:16:08.084423 2760023 main.go:120] Commandline: ["./mtail" "-progs" "eg.mtail" "-logs" "nonworking.log" "--one_shot" "--logtostderr" "--vmodule=decode=2" "--one_shot_format=prometheus"]
I0527 13:16:08.084985 2760023 store.go:189] Starting metric store expiry loop every 1h0m0s
I0527 13:16:08.085462 2760023 export.go:237] Export loop disabled.
I0527 13:16:08.087469 2760023 runtime.go:187] Loaded program eg.mtail
I0527 13:16:08.087758 2760023 logstream.go:74] Parsed url as /home/jaq/src/mtail/nonworking.log
I0527 13:16:08.087976 2760023 decode.go:64] sendline
I0527 13:16:08.088000 2760023 decode.go:64] sendline
I0527 13:16:08.087975 2760023 filestream.go:278] signalling stop at next EOF
I0527 13:16:08.088384 2760023 decode.go:64] sendline
I0527 13:16:08.088478 2760023 tail.go:295] Tailing /home/jaq/src/mtail/nonworking.log
I0527 13:16:08.088602 2760023 tail.go:316] pollPattern(/home/jaq/src/mtail/nonworking.log): no polling loop in oneshot mode
I0527 13:16:08.088636 2760023 tail.go:367] No gc loop in oneshot mode.
I0527 13:16:08.088672 2760023 mtail.go:141] Listening on [::]:3903
I0527 13:16:08.088905 2760023 runtime.go:295] END OF LINE
I0527 13:16:08.088930 2760023 runtime.go:296] processed 3 lines
I0527 13:16:08.089543 2760023 vm.go:1059] VM "eg.mtail" finished
I0527 13:16:08.089647 2760023 mtail.go:154] Shutdown requested.
# HELP count_lines defined at eg.mtail:2:9-19
# TYPE count_lines counter
count_lines{prog="eg.mtail"} 3
# HELP response_total defined at eg.mtail:1:9-22
# TYPE response_total counter
response_total{code="1000",prog="eg.mtail"} 1
I get the same results from both of the example files you posted at the start.
Great info. I will pass that forward and reopen a ticket if it's not solved in the new version.
Thanks!