`exec` and `http` sources add a `.timestamp` field to the event after decoding
A note for the community
- Please vote on this issue by adding a 👍 reaction to the original issue to help the community and maintainers prioritize this request
- If you are interested in working on this issue or have submitted a pull request, please leave a comment
Problem
When log namespacing is enabled, I expect only the event to be in .
For the socket source, this is the case no matter if decoding configuration is specified or not.
For the exec and http sources, when decoding configuration is added, there is an additional .timestamp field. Also, when there is no decoding configuration, the message is available inside a .message key rather than at ..
Maybe there is a reason for the inconsistency but it was definitely surprising to me. Also, I'm not sure what this additional timestamp field represents since its value is different to %vector.ingest_timestamp .
Configuration
schema:
log_namespace: true
sources:
source_exec:
type: exec
command: ['echo', '{"log": "123"}']
mode: scheduled
source_http_server:
type: http_server
address: 0.0.0.0:80
source_socket:
type: socket
address: 0.0.0.0:9000
mode: tcp
sinks:
sink_console:
inputs:
- source_*
type: console
encoding:
codec: json
Version
0.37.1
Debug Output
No response
Example Data
Commands to generate data
*exec runs automatically*
curl -XPOST -H "Content-Type: application/json" --data '{"log": "123"}' http://127.0.0.1
echo '{"log": "123"}' | nc localhost 9000
Without decoding
{"message":"{\"log\": \"123\"}"}
{"message":"{\"log\": \"123\"}"}
"{\"log\": \"123\"}"
With decoding
{"log":"123","timestamp":"2024-04-30T21:14:58.942264817Z"}
{"log":"123","timestamp":"2024-04-30T21:15:01.208653497Z"}
{"log":"123"}
Additional Context
No response
References
No response
Thanks for this report @brucejxz . This does seem like a bug. The expected behavior, as you note, is that just the event ends up in . when log namespacing is enabled. The rest of the fields should be in metadata.
This added .timestamp is likely the same an %vector.ingest_timestamp.
Hey @jszwedko. Thanks for the reply. Glad to hear that this isn't expected. Also, I added this transform so you can see that the .timestamp field is actually different:
transforms:
transform_remap:
inputs:
- source_*
type: remap
source: |
.@vector = %vector
Which results in:
{"@vector":{"ingest_timestamp":"2024-05-01T20:27:35.876003474Z","source_type":"exec"},"log":"123","timestamp":"2024-05-01T20:27:35.875959041Z"}
{"@vector":{"ingest_timestamp":"2024-05-01T20:27:39.332853800Z","source_type":"http_server"},"log":"123","timestamp":"2024-05-01T20:27:39.332841489Z"}
{"@vector":{"ingest_timestamp":"2024-05-01T20:27:45.064226909Z","source_type":"socket"},"log":"123"}
Ah, yes, sorry I meant to say "mostly the same" since they are likely both set to "now".
@jszwedko what do we want to adjust? do we remove the .timestamp when the .vector.ingest_timestamp exists? I can submit a PR to fix it when you confirm that.
@jszwedko what do we want to adjust? do we remove the
.timestampwhen the.vector.ingest_timestampexists? I can submit a PR to fix it when you confirm that.
It's more when log_namespacing is enabled, we don't want to add the timestamp field (but do want to add %vector.ingest_timestamp). A fix would be fantastic :)
I think one issue is that it looks like the log_namespacing is hardcoded to off in the exec source. See:
https://github.com/vectordotdev/vector/blob/aab836b502d3cc23594655d0d72224768122905a/src/sources/exec/mod.rs#L263-L264
It should look like:
https://github.com/vectordotdev/vector/blob/aab836b502d3cc23594655d0d72224768122905a/src/sources/http_client/client.rs#L240-L245
The http_server source looks ok though:
https://github.com/vectordotdev/vector/blob/aab836b502d3cc23594655d0d72224768122905a/src/sources/http_server.rs#L254-L258
So there might be a different issue there. Granted, I also haven't tried to reproduce this issue yet.
Just confirming that I can reproduce this. It seems like some decoders (like json) might be somehow adding a timestamp even if schema.log_namespace is true.
I can take a look at this. I'm assuming there's no work happening here.
Tried locally, still reproducible.