vector icon indicating copy to clipboard operation
vector copied to clipboard

fix(dnstap source): implement all TCP dnstap options to reduce error

Open esensar opened this issue 8 months ago • 2 comments

Summary

The dnstap TCP source was initially built based on the TCP socket source, but not all of the options were correctly implemented, which left the request limiter unused. This also changes the multithreaded approach to do async-aware waits, because previous implementation used thread sleep in async context.

Change Type

  • [x] Bug fix
  • [ ] New feature
  • [ ] Non-functional (chore, refactoring, docs)
  • [ ] Performance

Is this a breaking change?

  • [ ] Yes
  • [x] No

How did you test this PR?

Besides the included tests, this was also tested in real environments and it has not yet had errors like described in #20744.

Does this PR include user facing changes?

  • [x] Yes. Please add a changelog fragment based on our guidelines.
  • [ ] No. A maintainer will apply the "no-changelog" label to this PR.

Notes

  • Please read our Vector contributor resources.
  • Do not hesitate to use @vectordotdev/vector to reach out to us regarding this PR.
  • The CI checks run only after we manually approve them.
    • We recommend adding a pre-push hook, please see this template.
    • Alternatively, we recommend running the following locally before pushing to the remote branch:
      • cargo fmt --all
      • cargo clippy --workspace --all-targets -- -D warnings
      • cargo nextest run --workspace (alternatively, you can run cargo test --all)
      • ./scripts/check_changelog_fragments.sh
  • After a review is requested, please avoid force pushes to help us review incrementally.
    • Feel free to push as many commits as you want. They will be squashed into one before merging.
    • For example, you can run git merge origin master and git push.
  • If this PR introduces changes Vector dependencies (modifies Cargo.lock), please run cargo vdev build licenses to regenerate the license inventory and commit the changes (if any). More details here.

References

  • Related: #20744

Sponsored by Quad9

esensar avatar May 30 '25 07:05 esensar

We've been running this patch in production on half a dozen large collectors (probably half a trillion records processed?) for several days, with no issues. This has fixed our problems with dnstap sockets falling over and not recovering. Very much in favor of inclusion.

johnhtodd avatar Jun 02 '25 15:06 johnhtodd

@pront The CI has failed due to 502 error. Can we restart it so this can get merged?

esensar avatar Jun 15 '25 12:06 esensar