vector icon indicating copy to clipboard operation
vector copied to clipboard

`exec` and `http` sources add a `.timestamp` field to the event after decoding

Open brucejxz opened this issue 1 year ago • 7 comments

A note for the community

  • Please vote on this issue by adding a 👍 reaction to the original issue to help the community and maintainers prioritize this request
  • If you are interested in working on this issue or have submitted a pull request, please leave a comment

Problem

When log namespacing is enabled, I expect only the event to be in .

For the socket source, this is the case no matter if decoding configuration is specified or not.

For the exec and http sources, when decoding configuration is added, there is an additional .timestamp field. Also, when there is no decoding configuration, the message is available inside a .message key rather than at ..

Maybe there is a reason for the inconsistency but it was definitely surprising to me. Also, I'm not sure what this additional timestamp field represents since its value is different to %vector.ingest_timestamp .

Configuration

schema:
  log_namespace: true

sources:
  source_exec:
    type: exec
    command: ['echo', '{"log": "123"}']
    mode: scheduled
  source_http_server:
    type: http_server
    address: 0.0.0.0:80
  source_socket:
    type: socket
    address: 0.0.0.0:9000
    mode: tcp

sinks:
  sink_console:
    inputs:
      - source_*
    type: console
    encoding:
      codec: json

Version

0.37.1

Debug Output

No response

Example Data

Commands to generate data

*exec runs automatically*
curl -XPOST -H "Content-Type: application/json" --data '{"log": "123"}' http://127.0.0.1
echo '{"log": "123"}' | nc localhost 9000

Without decoding

{"message":"{\"log\": \"123\"}"}
{"message":"{\"log\": \"123\"}"}
"{\"log\": \"123\"}"

With decoding

{"log":"123","timestamp":"2024-04-30T21:14:58.942264817Z"}
{"log":"123","timestamp":"2024-04-30T21:15:01.208653497Z"}
{"log":"123"}

Additional Context

No response

References

No response

brucejxz avatar Apr 30 '24 21:04 brucejxz

Thanks for this report @brucejxz . This does seem like a bug. The expected behavior, as you note, is that just the event ends up in . when log namespacing is enabled. The rest of the fields should be in metadata.

This added .timestamp is likely the same an %vector.ingest_timestamp.

jszwedko avatar May 01 '24 14:05 jszwedko

Hey @jszwedko. Thanks for the reply. Glad to hear that this isn't expected. Also, I added this transform so you can see that the .timestamp field is actually different:

transforms:
  transform_remap:
    inputs:
      - source_*
    type: remap
    source: |
      .@vector = %vector

Which results in:

{"@vector":{"ingest_timestamp":"2024-05-01T20:27:35.876003474Z","source_type":"exec"},"log":"123","timestamp":"2024-05-01T20:27:35.875959041Z"}
{"@vector":{"ingest_timestamp":"2024-05-01T20:27:39.332853800Z","source_type":"http_server"},"log":"123","timestamp":"2024-05-01T20:27:39.332841489Z"}
{"@vector":{"ingest_timestamp":"2024-05-01T20:27:45.064226909Z","source_type":"socket"},"log":"123"}

brucejxz avatar May 01 '24 20:05 brucejxz

Ah, yes, sorry I meant to say "mostly the same" since they are likely both set to "now".

jszwedko avatar May 01 '24 20:05 jszwedko

@jszwedko what do we want to adjust? do we remove the .timestamp when the .vector.ingest_timestamp exists? I can submit a PR to fix it when you confirm that.

iFurySt avatar Aug 08 '24 14:08 iFurySt

@jszwedko what do we want to adjust? do we remove the .timestamp when the .vector.ingest_timestamp exists? I can submit a PR to fix it when you confirm that.

It's more when log_namespacing is enabled, we don't want to add the timestamp field (but do want to add %vector.ingest_timestamp). A fix would be fantastic :)

I think one issue is that it looks like the log_namespacing is hardcoded to off in the exec source. See:

https://github.com/vectordotdev/vector/blob/aab836b502d3cc23594655d0d72224768122905a/src/sources/exec/mod.rs#L263-L264

It should look like:

https://github.com/vectordotdev/vector/blob/aab836b502d3cc23594655d0d72224768122905a/src/sources/http_client/client.rs#L240-L245

The http_server source looks ok though:

https://github.com/vectordotdev/vector/blob/aab836b502d3cc23594655d0d72224768122905a/src/sources/http_server.rs#L254-L258

So there might be a different issue there. Granted, I also haven't tried to reproduce this issue yet.

jszwedko avatar Aug 08 '24 14:08 jszwedko

Just confirming that I can reproduce this. It seems like some decoders (like json) might be somehow adding a timestamp even if schema.log_namespace is true.

jszwedko avatar Aug 08 '24 14:08 jszwedko

I can take a look at this. I'm assuming there's no work happening here.

Tried locally, still reproducible.

rwakulszowa avatar Oct 19 '24 18:10 rwakulszowa