fix(dnstap source): implement all TCP dnstap options to reduce error
Summary
The dnstap TCP source was initially built based on the TCP socket source, but not all of the options were correctly implemented, which left the request limiter unused. This also changes the multithreaded approach to do async-aware waits, because previous implementation used thread sleep in async context.
Change Type
- [x] Bug fix
- [ ] New feature
- [ ] Non-functional (chore, refactoring, docs)
- [ ] Performance
Is this a breaking change?
- [ ] Yes
- [x] No
How did you test this PR?
Besides the included tests, this was also tested in real environments and it has not yet had errors like described in #20744.
Does this PR include user facing changes?
- [x] Yes. Please add a changelog fragment based on our guidelines.
- [ ] No. A maintainer will apply the "no-changelog" label to this PR.
Notes
- Please read our Vector contributor resources.
- Do not hesitate to use
@vectordotdev/vectorto reach out to us regarding this PR. - The CI checks run only after we manually approve them.
- We recommend adding a
pre-pushhook, please see this template. - Alternatively, we recommend running the following locally before pushing to the remote branch:
-
cargo fmt --all -
cargo clippy --workspace --all-targets -- -D warnings -
cargo nextest run --workspace(alternatively, you can runcargo test --all) -
./scripts/check_changelog_fragments.sh
-
- We recommend adding a
- After a review is requested, please avoid force pushes to help us review incrementally.
- Feel free to push as many commits as you want. They will be squashed into one before merging.
- For example, you can run
git merge origin masterandgit push.
- If this PR introduces changes Vector dependencies (modifies
Cargo.lock), please runcargo vdev build licensesto regenerate the license inventory and commit the changes (if any). More details here.
References
- Related: #20744
Sponsored by Quad9
We've been running this patch in production on half a dozen large collectors (probably half a trillion records processed?) for several days, with no issues. This has fixed our problems with dnstap sockets falling over and not recovering. Very much in favor of inclusion.
@pront The CI has failed due to 502 error. Can we restart it so this can get merged?