vector
vector copied to clipboard
File source: wrong offset if line is written without delimiter
A note for the community
- Please vote on this issue by adding a 👍 reaction to the original issue to help the community and maintainers prioritize this request
- If you are interested in working on this issue or have submitted a pull request, please leave a comment
Problem
In case a message is written partially at first (without delimiter), the offset calculation of the line is incorrect.
Configuration
The issue can be reproduced by using this script:
# fake-logger.sh
file1=tests/tdata/fake-log-1.log
echo -n "a" > $file1
sleep 3
echo "b" >> $file1
echo "c" >> $file1
file2=tests/tdata/fake-log-2.log
echo "ab" > $file2
echo "c" >> $file2
Vector config:
# vector.yaml
data_dir: tests/tdata
sources:
log:
include:
- tests/tdata/fake-log*.log
read_from: beginning
fingerprint:
strategy: device_and_inode
type: file
offset_key: offset
sinks:
console:
type: console
inputs:
- log
encoding:
codec: json
Version
0.41
Debug Output
❯ cargo vdev run --debug tests/vector.yaml
Compiling file-source v0.1.0 (/home/msidorov/vector/lib/file-source)
Compiling vector v0.41.0 (/home/msidorov/vector)
Compiling vector-lib v0.1.0 (/home/msidorov/vector/lib/vector-lib)
Finished `dev` profile [unoptimized + debuginfo] target(s) in 32.37s
Running `target/debug/vector --config tests/vector.yaml`
2024-08-15T20:24:44.847421Z INFO vector::app: Log level is enabled. level="info"
2024-08-15T20:24:44.849546Z INFO vector::app: Loading configs. paths=["tests/vector.yaml"]
2024-08-15T20:24:44.855881Z INFO vector::topology::running: Running healthchecks.
2024-08-15T20:24:44.856180Z INFO vector::topology::builder: Healthcheck passed.
2024-08-15T20:24:44.856259Z INFO vector: Vector has started. debug="true" version="0.41.0" arch="x86_64" revision=""
2024-08-15T20:24:44.856386Z INFO source{component_kind="source" component_id=log component_type=file}: vector::sources::file: Starting file server. include=["tests/tdata/fake-log*.log"] exclude=[]
2024-08-15T20:24:44.856831Z INFO source{component_kind="source" component_id=log component_type=file}:file_server: file_source::checkpointer: Attempting to read legacy checkpoint files.
2024-08-15T20:25:01.263737Z INFO source{component_kind="source" component_id=log component_type=file}:file_server: vector::internal_events::file::source: Found new file to watch. file=tests/tdata/fake-log-1.log
2024-08-15T20:25:03.313237Z INFO source{component_kind="source" component_id=log component_type=file}:file_server: vector::internal_events::file::source: Found new file to watch. file=tests/tdata/fake-log-2.log
{"file":"tests/tdata/fake-log-1.log","host":"...","message":"ab","offset":1,"source_type":"file","timestamp":"2024-08-15T20:25:03.314009749Z"}
{"file":"tests/tdata/fake-log-1.log","host":"...","message":"c","offset":3,"source_type":"file","timestamp":"2024-08-15T20:25:03.314240274Z"}
{"file":"tests/tdata/fake-log-2.log","host":"...","message":"ab","offset":0,"source_type":"file","timestamp":"2024-08-15T20:25:03.314342546Z"}
{"file":"tests/tdata/fake-log-2.log","host":"...","message":"c","offset":3,"source_type":"file","timestamp":"2024-08-15T20:25:03.314438288Z"}
As seen here, the offset of the first event in tests/tdata/fake-log-1.log is 1, which is unexpected
It seems like the offset was indeed set to the offset of the newly written part of the line