vector icon indicating copy to clipboard operation
vector copied to clipboard

feat(transform): Add redis enrichment transformer

Open akutta opened this issue 2 months ago • 1 comments

Summary

Adds a transformer that allows users to use a templated value from each event and enrich it with data stored in redis.

Vector configuration

Example Configurations:

No Local Caching

sources:
  s3_logs:
    type: aws_s3
    region: us-west-2
    compression: "gzip"
    decoding:
      codec: "json"
    sqs:
      queue_url: https://sqs.us-west-2.amazonaws.com/*/tmp-dkutta

transforms:
  enrich_data:
    inputs: ["s3_logs"]
    type: redis
    url: "redis://127.0.0.1:6379/0"
    key: "{{ application }}"
    output_field: "app_metadata"

sinks:
  black_hole:
    type: blackhole
    inputs:
      - enrich_data

Enable LRU Caching:

transforms:
  enrich_data_cache:
    inputs: ["s3_logs"]
    type: redis
    url: "redis://127.0.0.1:6379/0"
    key: "{{ application }}"
    output_field: "app_metadata"
    cache_max_size: 10000
    cache_ttl: 10000

How did you test this PR?

Test Setup:

  • Local Machine: M3 Macbook
  • Local Redis running in Docker
    • ~ 70% of log events had associated keys in Redis with content to enrich the event with.

I ran local builds of vector to validate functionality and performance.

Results:

using cargo run

  • No Transform: ~100k logs/s
  • Remap Transform: ~90k logs/s (simple just adds a property to the events)
  • Redis Transform no Cache: ~70k logs/s
  • Redis Transform w/ Cache: ~65k logs/s
    • this surprised me, I was expecting to have higher throughput by minimizing network i/o. This will likely be useful for redis clusters that have higher network i/o related to it.

using release built artifact

  • No Transform: (~90% cpu | 315 MB) Screenshot 2025-12-04 at 6 31 51 PM
  • Remap Transform: (~130% cpu | 560 MB) Screenshot 2025-12-04 at 6 26 43 PM
  • Redis Transform no cache: (~280% cpu | 618 MB) Screenshot 2025-12-04 at 6 28 13 PM
  • Redis Transform w/ Cache: (~250% cpu | 670 MB) Screenshot 2025-12-04 at 6 29 07 PM

When running in release mode, the performance differences became negligible when running with a locally hosted redis server. The CPU utilization was roughly double that of a simple remap.

Change Type

  • [ ] Bug fix
  • [x] New feature
  • [ ] Non-functional (chore, refactoring, docs)
  • [ ] Performance

Is this a breaking change?

  • [ ] Yes
  • [x] No

Does this PR include user facing changes?

  • [x] Yes. Please add a changelog fragment based on our guidelines.
  • [ ] No. A maintainer will apply the no-changelog label to this PR.

References

Notes

  • Please read our Vector contributor resources.
  • Do not hesitate to use @vectordotdev/vector to reach out to us regarding this PR.
  • Some CI checks run only after we manually approve them.
    • We recommend adding a pre-push hook, please see this template.
    • Alternatively, we recommend running the following locally before pushing to the remote branch:
      • make fmt
      • make check-clippy (if there are failures it's possible some of them can be fixed with make clippy-fix)
      • make test
  • After a review is requested, please avoid force pushes to help us review incrementally.
    • Feel free to push as many commits as you want. They will be squashed into one before merging.
    • For example, you can run git merge origin master and git push.
  • If this PR introduces changes Vector dependencies (modifies Cargo.lock), please run make build-licenses to regenerate the license inventory and commit the changes (if any). More details here.

akutta avatar Dec 05 '25 01:12 akutta

Hi there, thanks for this PR. Adding a do not merge from the docs team till after the Vector team approves these changes. Let us know once this is ready :)

iadjivon avatar Dec 05 '25 21:12 iadjivon