Vector Lookup address to DNS even if TTL is higher
A note for the community
- Please vote on this issue by adding a 👍 reaction to the original issue to help the community and maintainers prioritize this request
- If you are interested in working on this issue or have submitted a pull request, please leave a comment
Problem
We are using Vector version vector-0.40.0-1.x86_64 in our linux setup where we have below configuration to send logs to vector aggregators and the endpoint is on Envoy Proxy.
sinks: vector: type: vector #healthcheck: False address: "https://vector-nonprod.abc.com" compression: True inputs: - parsing - nginx batch: max_bytes: 10000 max_events: 10000 buffer: type: "disk" max_size: 268435488 request: rate_limit_num: 30 retry_attempts: 100 timeout_secs: 5 retry_max_duration_secs: 5 retry_initial_backoff_secs: 1 retry_jitter_mode: Full
it keeps connecting to DNS for lookup for vector-nonprod.abc.com all the time and it is making too much query to DNS while it should use the DNS caching itself or use server resolv configuration to get the data instead of going directly to DNS.
Here are some connections towards our DNS server and this is just for nonprod , but for prod we have something like 500 connections towards DNS and 300 something queries per minute towards DNS. this is affecting our DNS badly with too many requests. If there is any solution to make this work please guide.
netstat -n | grep 254 udp 0 0 10.10.10.17:28174 10.10.10.254:53 ESTABLISHED udp 0 0 10.10.10.17:36843 10.10.10.254:53 ESTABLISHED udp 0 0 10.10.10.17:47618 10.10.10.254:53 ESTABLISHED udp 0 0 10.10.10.17:59961 10.10.10.254:53 ESTABLISHED
Configuration
sinks:
vector:
type: vector
#healthcheck: False
address: "https://vector-nonprod.abc.com:443"
compression: True
inputs:
- parsing
- nginx
batch:
max_bytes: 10000
max_events: 10000
buffer:
type: "disk"
max_size: 268435488
request:
rate_limit_num: 30
retry_attempts: 100
timeout_secs: 5
retry_max_duration_secs: 5
retry_initial_backoff_secs: 1
retry_jitter_mode: Full
Version
vector 0.40.0 (x86_64-unknown-linux-gnu 1167aa9 2024-07-29 15:08:44.028365803)
Debug Output
No response
Example Data
No response
Additional Context
No response
References
No response
Also have used below combination in address "https://vector-nonprod.abc.com:443"
still it is the same.
I think we discussed this in Discord a bit. I mentioned there that Vector does a DNS lookup every time in initiates a connection. However, even given that, it seems like you are seeing many more lookups than might be expected (it seems unlikely, but maybe possible?, that Vector is initiating 500 connections per second).
Regardless, it does seem prudent for Vector to do DNS caching so I think adding that would be a reasonable way to address this issue.
Yes , I meet the same problem my config:
[sinks.out] type = "loki" inputs = [ "remove_kafka_fields" ] endpoint = "http://distributor-loki.my.com/" out_of_order_action = "accept" remove_timestamp = true tenant_id = "myapp"
use tcpdump to watch:
tcpdump -vvn port 53
so many dns resolution ;
Just want to add more here we also using Splunk HEC as sink and it is also having the similar issue and we see too many DNS connections and queries being done which is heavy on DNS setup. Would be good if we get some fix for this.
Hello, we don't have the capacity to get to this right now. We always welcome PRs and we do our best to review them ASAP.
Regardless, it does seem prudent for Vector to do DNS caching so I think adding that would be a reasonable way to address this issue.
In this instance, the solution Jesse mentioned seems like the best way to fix this issue. I would start looking at https://github.com/vectordotdev/vector/blob/master/src/dns.rs, potentially introducing a caching layer there. We might also want to expose some new config options for this, such as turning caching on/off and TTL for cache entries.
In this instance, the solution Jesse mentioned seems like the best way to fix this issue. I would start looking at https://github.com/vectordotdev/vector/blob/master/src/dns.rs, potentially introducing a caching layer there. We might also want to expose some new config options for this, such as turning caching on/off and TTL for cache entries.
All in favor of such a modification to allow Vector to have some sense of a local cache for DNS lookups, though I would strongly warn against a TTL that is inside of Vector for DNS caching. DNS already has the concept of TTL, and layering a different (manual) TTL on top of that will be confusing and may lead to conflicting operational goals at different layers of the pointer resolution process. Please use DNS TTL as "the" TTL.
Please use DNS TTL as "the" TTL.
Sounds right 👍 I suppose this concept will come up during the PR review. @johnhtodd you are welcome to help review this feature whenever we have a PR.
Will jump on this in the next few days when I have a moment or two.
I'll likely be using Hickory's Resolver based on a recommendation from jszwedko in the Discord.
However that gets implemented, I'll be treating it as a hard requirement to be able to disable Vector's internal resolver and opt for a local resolver instead via config.
Just making this comment so y'all are aware I'm willing/am looking at this 🙂
Hi
Did we have any update on this one ? if this can be checked and fixed ?
No updates from my side, but I'd be happy to see a PR using the Hickory resolver as described above.