fix(splunk_hec): validate and flatten fields
Summary
The splunk_hec sink now flattens objects and arrays. Presently Vector sends nested objects and arrays that the Splunk HEC api rejects. Integers also also silently dropped. Due to this behaviour, any indexed fields that are not at the top level of the Vector log event cannot be referenced.
Flattening is preferable to silently dropping nested objects and arrays as there's no data loss. It is also particularly useful for indexing fields with unknown names at runtime. For example Kubernetes pod labels.
I'd also just like to preface that I am a Rust novice and this is my first PR, so apologies if I have missed the mark here.
Vector configuration
How did you test this PR?
Tested against my own Splunk environment with the above config.
Verified the valid field inputs using a local instance of Splunk. Although the Splunk documentation states only strings and string arrays are valid, I found the following types were accepted:
- string
- float
- boolean
- null
- flat arrays consisting of any of the above types
Integers are silently dropped from the fields.
Nested arrays or objects resulted in the event being rejected (400 response code).
❯ curl -H "Authorization: Splunk TOKEN" -d '{"index": "my-index", "fields": {"string": "foo", "number": 1, "float": 1.23, "true": true, "false": false, "null": null, "flat_array": ["one", "2", false]}, "event": "Validation", "sourcetype": "manual"}' https://hec.local/services/collector
{"text":"Success","code":0}
❯ curl -H "Authorization: Splunk TOKEN" -d '{"index": "my-index", "fields": {"string": "foo", "number": 1, "float": 1.23, "true": true, "false": false, "null": null, "flat_array": ["one", "2", false], "nested_array": ["foo", ["bar"]]}, "event": "Validation", "sourcetype": "manual"}' https://hec.local/services/collector
{"text":"Error in handling indexed fields","code":15,"invalid-event-number":0}
❯ curl -H "Authorization: Splunk TOKEN" -d '{"index": "my-index", "fields": {"string": "foo", "number": 1, "float": 1.23, "true": true, "false": false, "null": null, "flat_array": ["one", "2", false], "nested_object": {"foo": "bar"}}, "event": "Validation", "sourcetype": "manual"}' https://hec.local/services/collector
{"text":"Error in handling indexed fields","code":15,"invalid-event-number":0}
I've maintained the previous Vector behaviour for all the existing working types (string, float, boolean, null) i.e. They do not get stringified. Integers are however now stringified so they do not get dropped.
Note: Splunk itself has no types once the event has been indexed so there is no side effect of stringifying.
I've also added additional test cases for various types.
Change Type
- [x] Bug fix
- [ ] New feature
- [ ] Non-functional (chore, refactoring, docs)
- [ ] Performance
Is this a breaking change?
- [ ] Yes
- [x] No
Does this PR include user facing changes?
- [x] Yes. Please add a changelog fragment based on our guidelines.
- [ ] No. A maintainer will apply the
no-changeloglabel to this PR.
References
- Closes: #24152
Notes
- Please read our Vector contributor resources.
- Do not hesitate to use
@vectordotdev/vectorto reach out to us regarding this PR. - Some CI checks run only after we manually approve them.
- We recommend adding a
pre-pushhook, please see this template. - Alternatively, we recommend running the following locally before pushing to the remote branch:
-
make fmt -
make check-clippy(if there are failures it's possible some of them can be fixed withmake clippy-fix) -
make test
-
- We recommend adding a
- After a review is requested, please avoid force pushes to help us review incrementally.
- Feel free to push as many commits as you want. They will be squashed into one before merging.
- For example, you can run
git merge origin masterandgit push.
- If this PR introduces changes Vector dependencies (modifies
Cargo.lock), please runmake build-licensesto regenerate the license inventory and commit the changes (if any). More details here.